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(57) The present invention provides novel human 
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nucleotide sequence coding for the amino acid 
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same in various tissues, analyze their structures and 
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the genes by the technology of genetic engineering. 
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responding expression products, elucidate the pathol- 
ogy of diseases associated with the genes, for example 
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Description 

TECHNICAL FIELD 

5 The present invention relates to a gene useful as an indicator in the prophylaxis, diagnosis and treatment of dis- 
eases in humans. More particularly, it relates to a novel human gene analogous to rat, mouse, yeast, nematode and 
known human genes, among others, and utilizable. after cDNA analysis thereof, chromosome mapping of cDNA and 
function analysis of cDNA, in gene diagnosis using said gene and in developing a novel therapeutic method. 

10 BACKGROUND ART 

The genetic information of a living thing has been accumulated as sequences (DNA) of four bases, namely A, C, G 
and T, which exist in cell nuclei. Said genetic information has been preserved for line preservation and ontogeny of each 
individual living thing. 

75 In the case of human being, the number of said bases is said to be about 3 billion (3 x 10 9 ) and supposedly there 
are 50 to 100 thousand genes therein. Such genetic information serves to maintain biological phenomena in that regu- 
latory proteins, structural proteins and enzymes are produced via such route that mRNA is transcribed from a gene 
(DNA) and then trans lated into a protein. Abnormalities in said route from gene to protein translation are considered to 
be causative of abnormalities of life supporting systems, for example in cell proliferation and differentiation, hence caus- 
ae ative of various diseases. 

As a result of gene analyses so far made, a number of genes which may be expected to serve as useful materials 
in drug development, have been found, for example genes for various receptors such as insulin receptor and LDL 
receptor, genes involved in cell proliferation and differentiation and genes for metabolic enzymes such as proteases, 
ATPase and superoxide dismutases. 
25 However, analysis of human genes and studies of the functions of the genes analyzed and of the relations between 
the genes analyzed and various diseases have been just begun and many points remain unknown. Further analysis of 
novel genes, analysis of the functions thereof, studies of the relations between the genes analyzed and diseases, and 
studies for applying the genes analyzed to gene diagnosis or for medicinal purposes, for instance, are therefore desired 
in the relevant art. 

30 If such a novel human gene as mentioned above can be provided, it will be possible to analyze the level of expres- 
sion thereof in each cell and the structure and function thereof and, through expression product analysis and other stud- 
ies, it may become possible to reveal the pathogenesis of a disease associated therewith, for example a genopathy or 
cancer, or diagnose and treat said disease, for instance. It is an object of the present invention to provide such a novel 
human gene. 

35 For attaining the above object, the present inventors made intensive investigations and obtained the findings men- 
tioned below. Based thereon, the present invention has now been completed. 

DISCLOSURE OF INVENTION 

40 Thus, the present inventors synthesized cDNAs based on mRNAs extracted from various tissues, inclusive of 
human fetal brain, adult blood vessels and placenta, constructed libraries by inserting them into vectors, allowing colo- 
nies of Escherichia coli transformed with said libraries to form on agar medium, picked up colonies at random and trans- 
ferred to 96-well micro plates and registered a large number of human gene-containing E. coli clones. 

Each clone thus registered was cultivated on a small size, DNA was extracted and purified, the four base-specif i- 

45 cally terminating extension reactions were carried out by the dideoxy chain terminator method using the cDNA 
extracted as a template, and the base sequence of the gene was determined over about 400 bases from the 5' terminus 
thereof using an automatic DNA sequencer. Based on the thus-obtained base sequence information, a novel family 
gene analogous to known genes of animal and plant species such as bacteria, yeasts, nematodes, mice and humans 
was searched for. 

so The method of the above-mentioned cDNA analysis is detailedly described in the literature by Fujiwara, one of the 
present inventors [Fujiwara, Tsutomu, Saibo Kogaku (Cell Engineering), 14, 645-654 (1995)]. 

Among this group, there are novel receptors, DNA binding domain-containing transcription regulating factors, sig- 
nal transmission system factors, metabolic enzymes and so forth. Based on the homology of the novel gene of the 
present invention as obtained by gene analysis to the genes analogous thereto, the product of the gene, hence the func- 

55 tion of the protein, can approximately be estimated by analogy. Furthermore, such functions as enzyme activity and 
binding ability can be investigated by inserting the candidate gene into an expression vector to give a recombinant. 

According to the present invention, there are provided a novel human gene characterized by containing a nucle- 
otide sequence coding for an amino acid sequence defined by SEQ ID NO:1 :4, :7, :10, :13, :16, :19, :22, 25, :28, :31, 
:34, :37 or 40, a human gene characterized by containing the nucleotide sequence defined by SEQ ID NO:2, :5, :8, :1 1 , 
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:14, :17, :20, :23, :26, :29, :32 t :35, :38 or :41, respectively coding for the amino acid sequence mentioned above, and 
a novel human gene characterized by the nucleotide sequence defined by SEQ ID NO:3, :6, :9, :12, :15, :18, :21, :24, 
:27, :30, :33, :36, :39 or :42. 

The symbols used herein for indicating amino acids, peptides, nucleotides, nucleotide sequences and so on are 
5 those recommended by IUPAC and IUB or in "Guideline for drafting specifications etc. including nucleotide sequences 
or amino acid sequences" (edited by the Japanese Patent Office), or those in conventional use in the relevant field of 
art. 

As specific examples of such gene of the present invention, there may be mentioned genes deducible from the 
DNA sequences of the clones designated as "GEN-501 D08", "GEN-080G01", "GEN-025F07", "GEN-076C09", "GEN- 
10 331 G07", "GEN-163D09". "GEN-078D05TA13", "GEN-423A12", "GEN-092E10", "GEN-428B1 2", "GEN-073E07", 
"GEN-093E05" and "GEN-077A09" shown later herein in Examples 1 to 1 1 . The respective nucleotide sequences are 
as shown in the sequence listing. 

These clones have an open reading frame comprising nucleotides (nucleic acid) respectively coding for the amino 
acids shown in the sequence listing. Their molecular weights were calculated at the values shown later herein in the 
15 respective examples. Hereinafter, these human genes of the present invention are sometimes referred to as the desig- 
nation used in Examples 1 to 1 1. 

In the following, the human gene of the present invention is described in further detail. 

As mentioned above, each human gene of the present invention is analogous to rat, mouse, yeast, nematode and 
known human genes, among others, and can be utilized in human gene analysis based on the information about the 

20 genes analogous thereto and in studying the function of the gene analyzed and the relation between the gene analyzed 
and a disease. It is possible to use said gene in gene diagnosis of the disease associated therewith and in exploitation 
studies of said gene for medicinal purposes. 

The gene of the present invention is represented in terms of a single-stranded DNA sequence, as shown under 
SEQ ID NO:2. It is to be noted, however, that the present invention also includes a DNA sequence complementary to 

25 such a single-stranded DNA sequence and a component comprising both. The sequence of the gene of the present 
invention as shown under SEQ ID NO:3n - 1 (where n is an integer of 1 to 14) is merely an example of the codon com- 
bination encoding the respective amino acid residues. The gene of the present invention is not limited thereto but can 
of course have a DNA sequence in which the codons are arbitrarily selected and combined for the respective amino 
acid residues. The codon selection can be made in the conventional manner, for example taking into consideration the 

30 codon utilization frequencies in the host to be used [Nucl. Acids Res., 9, 43-74 (1981)]. 

The gene of the present invention further includes DNA sequences coding for functional equivalents derived from 
the amino acid sequence mentioned above by partial amino acid or amino acid sequence substitution, deletion or addi- 
tion. These polypeptides may be produced by spontaneous modification (mutation) or may be obtained by posttransla- 
tional modification or by modifying the natural gene (of the present invention) by a technique of genetic engineering, for 

35 example by site-specific mutagenesis [Methods in Enzymology, 154, P- 350, 367-382 (1987); ibid-, IflQ, P- 468 (1983); 
Nucleic Acids Research, 12, p. 9441 (1984); Zoku Seikagaku Jikken Koza (Sequel to Experiments in Biochemistry) 1, 
"Idensi Kenkyu-ho (Methods in Gene Research) II", edited by the Japan Biochemical Society, p. 105 (1986)] or synthe- 
sizing mutant DNAs by a chemical synthetic technique such as the phosphotriester method or phosphoamidite method 
[J. Am. Chem. Soc. 89, p. 4801 (1967); ibid.. 91. p. 3350 (1969); Science, 150. p. 178 (1968); Tetrahedron Lett., Z>, p. 

40 1859 (1981); ibid-, 24, p. 245 (1983)], or by utilizing the techniques mentioned above in combination. 

The protein encoded by the gene of the present invention can be expressed readily and stably by utilizing said 
gene, for example inserting it into a vector for use with a microorganism and cultivating the microorganism thus trans- 
formed. 

The protein obtained by utilizing the gene of the present invention can be used in specific antibody production. In 
45 this case, the protein producible in large quantities by the genetic engineering technique mentioned above can be used 
as the component to serve as an antigen. The antibody obtained may be polyclonal or monoclonal and can be advan- 
tageously used in the purification, assay, discrimination or identification of the corresponding protein. 

The gene of the present invention can be readily produced based on the sequence information thereof disclosed 
herein by using general genetic engineering techniques [cf. e.g. Molecular Cloning, 2nd Ed., Cold Spring Harbor Lab- 
so oratory Press (1989); Zoku Seikagaku Jikken Koza, "Idenshi Kenkyu-ho I, II and III", edited by the Japan Biochemical 
Society (1986)]. 

This can be achieved, for example, by selecting a desired clone from a human cDNA library (prepared in the con- 
ventional manner from appropriate cells of origin in which the gene is expressed) using a probe or antibody specific to 
the gene of the present invention [e.g. Proc. Natl. Acad. Sci. USA, 78, 6613 (1981); Science, 222, 778 (1983)]. 
55 The cells of origin to be used in the above method are, for example, cells or tissues in which the gene in question 
is expressed, or cultured cells derived therefrom. Separation of total RNA, separation and purification of mRNA, con- 
version to (synthesis of) cDNA, cloning thereof and so on can be carried out by conventional methods. cDNA libraries 
are also commercially available and such cDNA libraries, for example various cDNA libraries available from Clontech 
Lab. Inc. can also be used in the above method. 



3 



EP 0 796 913 A2 



Screening of the gene of the present invention from these cDNA libraries can be carried out by the conventional 
method mentioned above. These screening methods include, for example, the method comprising selecting a cDNA 
clone by immunological screening using an antibody specific to the protein produced by the corresponding cDNA, the 
technique of plaque or colony hybridization using probes selectively binding to the desired DNA sequence, or a combi- 

5 nation of these. As regards the probe to be used here, a DNA sequence chemically synthesized based on the informa- 
tion about the DNA sequence of the present invention is generally used. It is of course possible to use the gene of the 
present invention or fragments thereof as the proble. 

Furthermore, a sense primer and an antisense primer designed based on the information about the partial amino 
acid sequence of a natural extract isolated and purified from cells or a tissue can be used as probes for screening. 

w For obtaining the gene of the present invention, the technique of DNA/RNA amplification by the PGR method [Sci- 
ence, 23£L 1350-1354 (1984)] can suitably be employed. Particularly when the full-length cDNA can hardly be obtained 
from the library, the RACE method (rapid amplification of cDNA ends; Jikken Igaku (Experimental Medicine), 12 (6), 35- 
38 (1994)], in particular the 5'RACE method [Frohman, M. A. f et al., Proc. Natl. Acad. Sci. USA, 85, 8998-9002 (1988)] 
is preferably employed. The primers to be used in such PCR method can be appropriately designed based on the 

75 sequence information of the gene of the present invention as disclosed herein and can be synthesized by a conven- 
tional method. 

The amplified DNA/RNA fragment can be isolated and purified by a conventional method as mentioned above, for 
example by gel electrophoresis. 

The nucleotide sequence of the thus-obtained gene of the present invention or any of various DNA fragments can 

20 be determined by a conventional method, for example the dideoxy method [Proc. Natl. Acad. Sci. USA, 74, 5463-5467 
(1977)] or the Maxam-Gilbert method [Methods in Enzymology, 65. 499 (1980)]. Such nucleotide sequence determina- 
tion can be readily performed using a commercially available sequence kit as well. 

When the gene of the present invention is used and conventional techniques of recombinant DNA technology [see 
e.g. Science, 224, p. 1431 (1984); Biochem. Biophys. Res. Comm., 130, p. 692 (1985); Proc. Natl. Acad. Sci. USA 80, 

25 p. 5990 (1983) and the references cited above] are followed, a recombinant protein can be obtained. More detailedly, 
said protein can be produced by constructing a recombinant DNA enabling the gene of the present invention to be 
expressed in host cells, introducing it into host cells for transformation thereof and cultivating the resulting transformant. 

In that case, the host cells may be eukaryotic or prokaryotic. The eukaryotic cells include vertebrate cells, yeast 
cells and so on, and the vertebrate cells include, but are not limited to, simian cells named COS cells [Cell, 23, 1 75-182 

30 (1981)], Chinese hamster ovary cells and a dihydrofolate reductase-deficient cell line derived therefrom [Proc. Natl. 
Acad. Sci. USA, 77, 4216-4220 (1980)] and the like, which are frequently used. 

As regards the expression vector to be used with vertebrate cells, an expression vector having a promoter located 
upstream of the gene to be expressed, RNA splicing sites, a polyadenylation site and a transcription termination 
sequence can be generally used. This may further have an origin of replication as necessary. As an example of said 

35 expression vector, there may be mentioned pSV2dhfr [Mol. Cell. Biol., 1, 854 (1981)], which has the SV40 early pro- 
moter. As for the eukaryotic microorganisms, yeasts are generally and frequently used and, among them, yeasts of the 
genus Saocharomyces can be used with advantage. As regards the expression vector for use with said yeasts and 
other eukaryotic microorganisms, pAM82 [Proc. Natl. Acad. Sci. USA, 80, 1-5 (1983)], which has the acid phosphatase 
gene promoter, for instance, can be used. 

40 Furthermore, a prokaryotic gene fused vector can be preferably used as the expression vector for the gene of the 
present invention. As specific examples of said vector, there may be mentioned pGEX-2TK and pGEX-4T-2 which have 
a GST domain (derived from S. iaponicum) with a molecular weight of 26,000. 

Escherichia coli' and Bacillus subtilis are generally and preferably used as prokaryotic hosts. When these are used 
as hosts in the practice of the present invention, an expression plasmid derived from a plasmid vector capable of repli- 

45 eating in said host organisms and provided in this vector with a promoter and the SD (Shine and Dalgarno) sequence 
upstream of said gene for enabling the expression of the gene of the present invention and further provided with an ini- 
tiation codon (e.g. ATG) necessary for the initiation of protein synthesis is preferably used. The Escherichia coli strain 
K12, among others, is preferably used as the host Escherichia coli. and pBR322 and modified vectors derived there- 
from are generally and preferably used as the vector, while various known strains and vectors can also be used. Exam- 

50 pies of the promoter which can be used are the tryptophan (trp) promoter, Ipp promoter, lac promoter and PL/PR 
promoter. 

The thus-obtained desired recombinant DNA can be introduced into host cells for transformation by using various 
general methods. The transformant obtained can be cultured by a conventional method and the culture leads to expres- 
sion and production of the desired protein encoded by the gene of the present invention. The medium to be used in said 
55 culture can suitably be selected from among various media in conventional use according to the host cells employed. 
The host cells can be cultured under conditions suited for the growth thereof. 

In the above manner, the desired recombinant protein is expressed and produced and accumulated or secreted 
within the transformant cells or extracellularly or on the cell membrane. 

The recombinant protein can be separated and purified as desired by various separation procedures utilizing the 
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physical, chemical and other properties thereof [cf. e.g. "Seikagaku (Biochemistry) Data Book II", pages 1 1 75-1 259, 1 st 
Edition, 1st Printing, published June 23, 1980 by Tokyo Kagaku Dojin; Biochemistry, 25 (25), 8274-8277 (1986); Eur. J. 
Biochem., 1£2, 313-321 (1987)]. Specifically, said procedures include, among others, ordinary reconstitution treatment, 
treatment with a protein precipitating agent (salting out), centrifugation, osmotic shock treatment, sonication, ultraf iltra- 

5 tion, various liquid chromatography techniques such as molecular sieve chromatography (gel filtration), adsorption 
chromatography, ion exchange chromatography, affinity chromatography and high-performance liquid chromatography 
(HPLC), dialysis and combinations thereof. Among them, affinity chromatography utilizing a column with the desired 
protein bound thereto is particularly preferred. 

Furthermore, on the basis of the sequence information about the gene of the present invention as revealed by the 

10 present invention, for example by utilizing part or the whole of said gene, it is possible to detect the expression of the 
gene of the present invention in various human tissues. This can be performed by a conventional method, for example 
by RNA amplification by RT-PCR (reverse transcribed-polymerase chain reaction) [Kawasaki, E. S., et al. f Amplification 
of RNA, in PCR Protocol, A guide to methods and applications, Academic Press, Inc., San Diego, 21-27 (1991)], or by 
northern blotting analysis [Molecular Cloning, Cold Spring Harbor Laboratory (1989)], with good results. 

75 The primers to be used in employing the above-mentioned PCR method are not limited to any particular ones pro- 
vided that they are specific to the gene of the present invention and enable the gene of the present invention alone to 
be specifically amplified. They can be designed or selected apropriately based on the gene information provided by the 
present invention. They can have a partial sequence comprising about 20 to 30 nucleotides according to the estab- 
lished practice. Suitable examples are as shown in Examples 1 to 1 1 . 

20 Thus, the present invention also provides primers and/or probes useful in specifically detecting such novel gene. 

By using the novel gene provided by the present invention, it is possible to detect the expression of said gene in 
various tissues, analyze the structure and function thereof and, further, produce the human protein encoded by said 
gene in the manner of genetic enginnering. These make it possible to analyze the expression product, reveal the pathol- 
ogy of a disease associated therewith, for example a genopathy or cancer, and diagnose and treat the disease. 

25 The following drawings are referred to in the examples. 

Fig. 1 shows the result obtained by testing the PI4 kinase activity of NPIK in Example 9. Fig. 2 shows the effect of 
Triton X-100 and adenosine on NPIK activity. 

EXAMPLES 

30 

The following examples illustrate the present invention in further detail. 

Example 1 

35 GDP dissociation stimulator gene 

(1) Cloning and DNA sequencing of GDP dissociation stimulator gene 

mRNAs extracted from the tissues of human fetal brain, adult blood vessels and placenta were purchased from 
40 Clontech and used as starting materials. 

cDNA was synthesized from each mRNA and inserted into the vector XZAPII (Stratagene) to thereby construct a 
cDNA library (Otsuka GEN Research Institute, Otsuka Pharmaceutical Co., Ltd.) 

Human gene-containing Escherichia coli colonies were allowed to form on agar medium by the in yjyo excision 
technique [Short, J. M., et al., Nucleic Acids Res., 16, 7583-7600 (1988)]. Colonies were picked up at random and 
45 human gene-containing Escherichia coli clones were registered on 96-well micro plates. The clones registered were 
stored at -80°C. 

Each of the clones registered was cultured overnight in 1 .5 ml of LB medium, and DNA was extracted and purified 
using a model PI-100 automatic plasmid extractor (Kurabo). Contaminant Escherichia coli RNA was decomposed and 
removed by RNase treatment. The DNA was dissolved to a final volume of 30 A 2-pJ portion was used for roughly 
so checking the DNA size and quantity using a minigel, 7 jxl was used for sequencing reactions and the remaining portion 
(21 |il) was stored as plasmid DNA at 4°C. 

This method, after slight changes in the program, enables extraction of the cosmid, which is useful also as a probe 
for FISH (fluorescence in situ hybridization) shown later in the examples. 

Then, the dideoxy terminator method of Sanger et al. [Sanger, R, et al., Proc. Natl. Acad. Sci. USA, 74, 5463-5467 
55 (1977)] using T3, T7 or a synthetic oligonucleotide primer or the cycle suquence method [Carothers, A. M., et al., Bio. 
Techniques 7, 494-499 (1989)] comprising the dideoxy chain terminator method plus PCR method was carried out. 
These are methods of terminating the extension reaction specifically to the four bases using a small amount of plasmid 
DNA (about 0.1 to 0.5 |xg) as a template. 

The sequence primers used were FITC (fluorescein isothiocyanate)-labeled ones. Generally, about 25 cycles of 
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reaction were performed using Taq polymerase. The PCR products were separated on a polyacrylamide urea gel and 
the fluorescence-labeled DNA fragments were submitted to an automatic DNA sequencer (ALF™ DNA Sequencer; 
Pharmacia) for determining the sequence of about 400 bases from the 5' terminus side of cDNA. 

Since the 3' nontranslational region is high in heterogeneity for each gene and therefore suited for discriminating 

5 individual genes from one another, sequencing was performed on the 3' side as well depending on the situation. 

The vast sum of nucleotide sequence information obtained from the DNA sequencer was transferred to a 64-bit 
DEC 3400 computer for homology analysis by the computer. In the homology analysis, a data base (GenBank, EMBL) 
was used for searching according to the UWGCG FASTA program [Pearson, W. R. and Lipman, D. J., Proc. Natl. Acad. 
Sci. USA, 85, 2444-2448 (1988)]. 

10 As a result of arbitrary selection by the above method and of cDNA sequence analysis, a clone designated as GEN- 
501 D08 and having a 0.8 kilobase insert was found to show a high level of homology to the C terminal region of the 
human Ral guanine nucleotide dissociation stimulator (RalGDS) gene. Since RalGDS is considered to play a certain 
role in signal transmission pathways, the whole nucleotide sequence of the cDNA insert portion providing the human 
homolog was further determined. 

75 Low-molecular GTPases play an important role in transmitting signals for a number of cell functions including cell 
proliferation, differentiation and transformation [Bourne, H. R. et al., Nature, 34g, 125-132 (1990); Bourne et al., Nature, 
349. 117-127 (1991)]. 

It is well known that, among them, those proteins encoded by the ras gene family function as molecular switches 
or, in other words, the functions of the ras gene family are regulated by different conditions of binding proteins such as 
20 biologically inactive GDP-binding proteins or active GDP-binding proteins, and that these two conditions are induced by 
GTPase activating proteins (GAPs) or GDS. The former enzymes induce GDP binding by stimulating the hydrolysis of 
bound GTP and the latter enzyme induces the regular GTP binding by releasing bound GDP [Bogusuki, M. S. and 
McCormick, R, Nature, 2S£, 643-654 (1993)]. 

RalGDS was first discovered as a member of the ras gene family lacking in transforming activity and as a GDP dis- 
25 sociation stimulator specific to RAS [Chardin, P. and Tavitian, A., EMBO J., 5, 2203-2208 (1986); Albright, C. R, et al., 
EMBO J., 12, 339-347 (1 993)]. 

In addition to Ral, RalGDS was found to function, through interaction with these proteins, as an effector molecule 
for N-ras, H-ras, K-ras and Rap [Spaargaren, M. and Bischoff, J. R., Proc. Natl. Acad. Sci. USA, 91, 12609-12613 
(1994)]. 

30 The nucleotide sequence of the cDNA clone designated as GEN-501 D08 is shown under SEQ ID NO:3, the nucle- 
otide sequence of the coding region of said clone under SEQ ID NO:2, and the amino acid sequence encoded by said 
nucleotide sequence under SEQ ID NO:1. 

This cDNA comprises 842 nucleotides, including an open reading frame comprising 366 nucleotides and coding for 
122 amino acids. The translation initiation codon was found to be located at the 28th nucleotide residue. 

35 Comparison between the RalGDS protein known among conventional databases and the amino acid sequence 
deduced from said cDNA revealed that the protein encoded by this cDNA is homologous to the C terminal domain of 
human RalGDS. The amino acid sequence encoded by this novel gene was found to be 39.5% identical with the C ter- 
minal domain of RalGDS which is thought to be necessary for binding to ras. 

Therefore, it is presumable, as mentioned above, that this gene product might interact with the ras family proteins 

40 or have influence on the ras-mediated signal transduction pathways. However, this novel gene is lacking in the region 
coding for the GDS activity domain and the corresponding protein seems to be different in function from the GDS pro- 
tein. This gene was named human RalGDS by the present inventors. 

(2) Northern blot analysis 

45 

The expression of the RalGDS protein mRNA in normal human tissues was evaluated by Northern blotting using, 
as a probe, the human cDNA clone labeled by the random oligonucleotide priming method. 

The Northern blot analysis was carried out with a human MTN blot (Human Multiple Tissue Northern blot; Clontech, 
Palo Alto, CA, USA) according to the manufacturer's protocol. 
50 Thus, the PCR amplification product from the above GEN-501 D08 done was labeled with [ 32 P]<ICTP (random- 
primed DNA labeling kit. Boehringer-Mannheim) for use as a probe. 

For blotting, hybridization was performed overnight at 42°C in a solution comprising 50% formamide/5 x SSC/50 x 
Denhardt's solution/0.1% SDS (containing 100 jig/ml denatured salmon sperm DNA). After washing with two portions 
of 2 x SSC/0.01% SDS at room temperature, the membrane filter was further washed three times with 0.1 x SSC/0.05% 
55 SDS at 50°C for 40 minutes. An X-ray film (Kodak) was exposed to the filter at -70°C for 1 8 hours. 

As a result, it was revealed that a 900-bp transcript had been expressed in all the human tissues tested. In addition, 
a 3.2-kb transcript was observed specifically in the heart and skeletal muscle. The expression of these transcripts dif- 
fering in size may be due either to alternative splicing or to cross hybridization with homologous genes. 
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(3) Cosmid clone and chromosome localization by FISH 

FISH was performed by screening a library of human chromosomes cloned in the cosmid vector pWE15 using, as 
a probe, the 0.8-kb insert of the cDNA clone [Sambrook, J., et al., Molecular Cloning, 2nd Ed., pp. 3.1-3.58, Cold Spring 
5 Harbor Laboratory Press, Cold Spring Harbor, New York (1989)]. 

FISH for chromosome assignment was carried out by the method of Inazawa et al. which comprises G-banding pat- 
tern comparison for confirmation [Inazawa, J., et al., Genomics, 17. 153-162 (1993)]. 

For use as a probe, the cosmid DNA (0.5 jig) obtained from chromosome screening and corresponding to GEN- 
501 D08 was labeled with biotin-16-dUTP by nick translation. 
10 To eliminate the background noise due to repetitive sequences, 0.5 jxl of sonicated human placenta DNA (10 
mg/ml) was added to 9.5 jutl of the probe solution. The mixture was denatured at 80°C for 5 minutes and admixed with 
an equal volume of 4 x SSC containing 20% dextransulfate. Then, a denatured slide was sown with the hybridization 
mixture and, after covering with paraffin, incubated in a wet chamber at 37°C for 1 6 to 1 8 hours. After washing with 50% 
formamide/2 x SSC at 37°C for 15 minutes, the slide was washed with 2 x SSC for 15 minutes and further with 1 x SSC 
is for 15 minutes. 

The slide was then incubated in 4 x SSC supplemented with "1% Block Ace" (trademark; Dainippon Pharmaceuti- 
cal) containing avidin-FITC (5 jig/ml) at 37°C for 40 minutes. Then, the slide was washed with 4 x SSC for 10 minutes 
and with 4 x SSC containing 0.05% Triton X-100 for 10 minutes and immersed in an antifading PPD solution [prepared 
by adjusting 100 mg of PPD (Wako Catalog No. 164-015321) and 10 ml of PBS(-) (pH 7.4) to pH 8.0 with 0.5 M 
20 Na 2 CO3/0.5 M NaHC0 3 (9:1 , v/v) buffer (pH 9.0) and adding glycerol to make a total volume of 100 ml] containing 1% 
DABCO [1% DABCO (Sigma) in PBS(-) glycerol 1:9 (v:v)], followed by counter staining with DAPI (^-diamines-phe- 
nyl indole; Sigma). 

With more than 100 tested cells in the metaphase, a specific hybridization signal was observed on the chromosome 
band at 6p21 .3, without any signal on other chromosomes. It was thus confirmed that the RalGDS gene is located on 

25 the chromosome 6p21 .3. 

By using the novel human RalGDS-associated gene of the present invention as obtained in this example, the 
expression of said gene in various tissues can be detected and the human RalGDS protein can be produced in the 
manner of genetic engineering. These are expected to enable studies on the roles of the expression product protein and 
ras-mediated signals in transduction pathways as well as pathological investigations of diseases in which these are 

30 involved, for example cancer, and the diagnosis and treatment of such diseases. Furthermore, it becomes possible to 
study the development and progress of diseases involving the same chromosomal translocation of the RalGDS protein 
gene of the present invention, for example tonic spondylitis, atrial septal defect, pigmentary retinopathy, aphasia and 
the like. 

35 Ex a m p le 2 

Cytoskeleton-associated protein 2 gene (CKAP2 gene) 

(1) Cytoskeleton-associated protein 2 gene cloning and DNA sequencing 

40 

cDNA clones were arbitrarily chosen from a human fetal brain cDNA library in the same manner as in Example 1 
were subjected to sequence analysis and, as a result, a clone having a base sequence containing the CAP-glycine 
domain of the human cytoskeleton-associated protein (CAP) gene and highly homologous to several CAP family genes 
was found and named GEN-080G01 . 

45 Meanwhile, the cytoskeleton occurs in the cytoplasm and just inside the cell membrane of eukaryotic cells and is a 
network structure comprising complicatedly entangled filaments. Said cytoskeleton is constituted of microtubules com- 
posed of tubulin, microfilaments composed of actin, intermediate filaments composed of desmin and vimentin, and so 
on. The cytoskeleton not only acts as supportive cellular elements but also isokinetically functions to induce morpho- 
logical changes of cells by polymerization and depolymerization in the fibrous system. The cytoskeleton binds to intra- 

50 cellular organelles, cell membrane receptors and ion channels and thus plays an important role in intracellular 
movement and locality maintenance thereof and, in addition, is said to have functions in activity regulation and mutual 
information transmission. Thus it supposedly occupies a very important position in physiological activity regulation of 
the whole cell. In particular, the relation between canceration of cells and qualitative changes of the cytoskeleton 
attracts attention since cancer cells differ in morphology and recognition response from normal cells. 

55 The activity of this cytoskeleton is modulated by a number of cytoskeleton-associated proteins (CAPs). One group 
of CAPs is characterized by a glycine motif highly conserved and supposedly contributing to association with microtu- 
bules [CAP-GLY domain; Riehemann, K. and Song, C, Trends Biochem. Sci., 18, 82-83 (1993)]. 

Among the members of this group of CAPs, there are CLIP-170, 150 kDa DAP (dynein-associated protein, or dyn- 
actin), D. melanogaster GLUED, £. cerevisiae BIK1, restin [Bilbe, G., et al., EM BO J., 11 2103-21 13 (1992)]; Hilliker, 
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C, et ai., Cytogenet. Cell Genet., 65, 172-176 (1994)] and C. eleqans 13.5 kDa protein [Wilson, R., et al., Nature, 368, 
32-38 (1 994)]. Except for the last two proteins, direct or indirect evidences have suggested that they could interact with 
microtublues. 

The above-mentioned CLIP-170 is essential for the in vitro binding of endocytic vesicles to microtubules and colo- 
5 calizes with endocytic organelles [Rickard, J. E. and Kreis, T. E., J. Biol. Chem., 18, 82-83 (1990); Pierre, P., et al., Cell, 
ZQ, 887-900 (1992)]. 

The above-mentioned dynactin is one of the factors constituting the cytoplasmic dynein motor, which functions in 
retrograde vesicle transport [Schroer, T. A. and Sheetz, M. P., J. Cell Biol., 115, 1309-1318 (1991)] or probably in the 
movement of chromosomes during mitosis [Pfarr, C. M., etal., Nature, 345,263-265(1990); Steuer, E. R., etal., Nature, 
w 245, 266-268 (1990); Wordeman, L, et al., J. Cell Biol., 114, 285-294 (1991)]. 

GLUED, the Drosophila homolog of mammalian dynactin, is essential for the viability of almost all cells and for the 
proper organization of some neurons [Swaroop, A., et al., Proc. Natl. Acad. Sci. USA, 84, 6501-6505 (1987); Holzbaur, 
E. L R, et al., Nature, 351, 579-583 (1991)]. 

BIK1 interacts with microtubules and plays an important role in spindle formation during mitosis in yeasts [True- 
is heart, J., et al., Mol. Cell. Biol., 7, 2316-2326 (1987); Berlin, V., et al., J. Cell Biol., HI, 2573-2586 (1990)]. 

At present, these genes are classified under the term CAP family (CAPs). 

As a result of database searching, the above-mentioned cDNA clone of 463-bp (excluding the poly-A signal) 
showed significant homology in nucleotide sequence with the restin and CLIP-170 encoding genes. However, said 
clone was lacking in the 5' region as compared with the restin gene and, therefore, the technique of 5' RACE [Frohman, 
20 M. A., et al., Proc. Natl. Acad. Sci. USA 85, 8998-9002 (1988)] was used to isolate this missing segment. 

(2) 5' RACE (5' rapid amplification of cDNA ends) 

A cDNA clone containing the 5' portion of the gene of the present invention was isolated for analysis by the 5' RACE 
25 technique using a commercial kit (S'-Rapid AmpliFinder RACE kit, Clontech) according to the manufacturer's protocol 
with minor modifications, as follows. 

The gene-specific primer P1 and primer P2 used here were synthesized by the conventional method and their 
nucleotide sequences are as shown below in Table 1 . The anchor primer used was the one attached to the commercial 
kit. 

30 



Table 1 



Primer 


Nucleotide sequence 


Primer P1 


5'-ACACCAATCCAGTAGCCAGGCTTG-3' 


Primer P2 


5'-CACTCGAGAATCTGTGAGACCTACATACATGACG-3' 



40 cDNA was obtained by reverse transcription of 0.1 jig of human fetal brain poly(A)+RNA by the random hexamer 
technique using reverse transcriptase (Superscript™ II, Life Technologies) and the cDNA was amplified by the first PCR 
using the P1 primer and anchor primer according to Watanabe et al. [Watanabe, T, et al., Cell Genet., in press). 

Thus, to 0.1 *tg of the above-mentioned cDNA were added 2.5 mM dNTP/1 x Taq buffer (Takara Shuzo)/0.2 jutM P1 
primer, 0.2 jiM adaptor primer/0.25 unit ExTaq enzyme (Takara Shuzo) to make a total volume of 50 jil, followed by addi- 

45 tion of the anchor primer. The mixture was subjected to PCR. Thus, 35 cycles of amplification were performed under 
the conditions: 94°C for 45 seconds, 60°C for 45 seconds, and 72°C for 2 minutes. Finally, the mixture was heated at 
72 Q C for 5 minutes. 

Then, 1 jil of the 50-jtl first PCR product was subjected to amplification by the second PCR using the specific 
nested P2 primer and anchor primer. The second PCR product was analyzed by 1.5% agarose gel electrophoresis. 
so Upon agarose gel electrophoresis, a single band, about 650 nucleotides in size, was detected. The product from 
this band was inserted into a vector (pT7Blue(R)T-Vector, Novagen) and a plurality of clones with an insert having an 
appropriate size were selected. 

Six of the 5' RACE clones obtained from the PCR product had the same sequence but had different lengths. By 
sequencing two overlapping cDNA clones, GEN-080G01 and GEN-080G0149, the protein-encoding sequence and 5* 
55 and 3' flanking sequences, 1015 nucleotides in total length, were determined. Said gene was named cytoskeleton- 
associated protein 2 gene (CKAP2 gene). 

The nucleotide sequence obtained from the above-mentioned two overlapping cDNA clones GEN-080G01 and 
GEN-080G0149 is shown under SEQ ID NO:6, the nucleotide sequence of the coding region of said clone under SEQ 
ID NO:5, and the amino acid sequence encoded by said nucleotide sequence under SEQ ID NO:4. 
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As shown under SEQ ID NO:6, the CKAP2 gene had a relatively GC-rich 5' noncoding region, with incomplete tri- 
plet repeats, (CAG)4(CGG)4(CTG)(CGG), occurring at nucleotides 40-69. 

ATG located at nucleotides 274-276 is the presumable start codon. A stop codon (TGA) was situated at nucleotides 
853-855. A polyadenylation signal (ATTAAA) was followed by 16 nucleotides before the poly(A) start. The estimated 
5 open reading frame comprises 579 nucleotides coding for 193 amino acid residues with a calculated molecular weight 
of 21,800 daltons. 

The coding region was further amplified by RT-PCR, to eliminate the possibility of the synthetic sequence obtained 
being a cDNA chimera. 

w (2) Similarity of CKAP2 to other CAPs 

While sequencing of CKAP2 revealed homology with the sequences of restin and CLIP-170, the homologous 
region was limited to a short sequence corresponding to the CAP-GLY domain. On the amino acid level, the deduced 
CKAP2 was highly homologous to five other CAPs in this domain. 
is CKAP2 was lacking in such other motif characteristics of some CAPs as the alpha helical rod and zinc finger motif. 
The alpha helical rod is thought to contribute to dimerization and to increase the microtubule binding capacity [Pierre, 
P., et al., Cell, 70, 887-900 (1992)]. The lack of the alpha helical domain might mean that CKAP2 be incapable of homo 
or hetero dimer formation. 

Paralleling of the CAP-GLY domains of these proteins revealed that other conserved residues other than glycine 
20 residues are also found in CKAP2. CAPs having a CAP-GLY domain are thought to be associated with the activities of 
cellular organelles and the interactions thereof with microtubules. Since it contains a CAP-GLY domain, as mentioned 
above, CKAP2 is placed in the family of CAPs. 

Studies with mutants of Glued have revealed that the Glued product plays an important role in almost all cells [Swa- 
roop, A., etal., Proc. Natl. Acad. Sci. USA, 84. 6501-6505 (1987)] and that it has other neuron-specific functions in neu- 
25 ronal cells [Meyerowitz, E. M. and Kankel, D. R., Dev. Biol., 62, 1 12-142 (1978)]. These microtubule-associated proteins 
are thought to function in vesicle transport and mitosis. Because of the importance of the vesicle transport system in 
neuronal cells, defects in these components might lead to aberrant neuronal systems. 

In view of the above, CKAP2 might be involved in specific neuronal functions as well as in fundamental cellular 
functions. 

30 

(3) Northern blot analysis 

The expression of human CKAP2 mRNA in normal human tissues was examined by Northern blotting in the same 
manner as in Example 1 (2) using the GEN-080G01 clone (corresponding to nucleotides 553-1015) as a probe. 

35 As a result, in ail the eight tissues tested, namely human heart, brain, placenta, lung, liver, skeletal muscle, kidney 
and pancreas, a 1 .0 kb transcript agreeing in size with the CKAP2 cDNA was detected. Said 1 .0 kb transcript was 
expressed at significantly higher levels in heart and brain than in the other tissues examined. Two weak bands, 3.4 kb 
and 4.6 kb, were also detected in all the tissues examined. 

According to the Northern blot analysis, the 3.4 kb and 4.6 kb transcripts might possibly be derived from the same 

40 gene coding for the 1 .0 kb CKAP2 by alternative splicing or transcribed from other related genes. These characteristics 
of the transcripts may indicate that CKAP2 might also code for a protein having a CAP-GLY domain as well as an alpha 
helix. 

(4) Cosmid cloning and chromosomal localization by direct R-banding FISH 

45 

Two cosmids corresponding to the CKAP2 cDNA were obtained. These two cosmid clones were subjected to direct 
R-banding FISH in the same manner as in Example 1 

(3) for chromosomal locus mapping of CKAP2. 

50 

For suppressing the background due to repetitive sequences, a 20-fold excessive amount of human Cot-I DNA 
(BRL) was added as described by Lichter et al. [Uchter, P., et al., Proc. Natl. Acad. Sci. USA, SZ, 6634-6638 (1990)]. A 
Previa 100 film (Fuji ISO 100; Fuji Photo Film) was used for photomicrography. 
Asa result, CKAP2 was mapped on chromosome bands 19q13.11-q13.12. 
55 Two autosomal dominant neurological diseases have been localized to this region by linkage analysis: CADASIL 
(cerebral autosomal dominant adenopathy with subcortical infarcts and leukoencephalopathy) between the DNA mark- 
ers D19S221 and D19S222, and FHM (familial hemiplegic migraine) between D19S215 and D19S216. These two dis- 
eases may be allelic disorders in which the same gene is involved [Tournier-Lasserve, E„ et al., Nature Genet., & 256- 
259 (1993); Joutel, A., et al., Nature Genet, 5, 40-45 (1993)]. 
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Although no evidence is available to support CKAP2 as a candidate gene for FHM or CADASIL, it is conceivable 
that its mutation might lead to some or other neurological disease. 

By using the novel human CKAP2 gene of the present invention as obtained in this example, it is possible to detect 
the expression of said gene in various tissues or produce the human CKAP2 gene in the manner of genetic engineer- 
5 ing. Through these, it becomes possible to analyze the functions of the human CKAP2 system or human CKAP2, which 
is involved in diverse activities essential to cells, as mentioned above, to diagnose various neurological diseases in 
which said system or gene is involved, for example familial migraine, and to screen out and evaluate a therapeutic or 
prophylactic drug therefor. 

10 Example 3 
OTK27 gene 

(1) OTK27 gene cloning and DNA sequencing 

15 

As a result of sequence analysis of cDNA clones arbitrary selected from a human fetal brain cDNA library in the 
same manner as in Example 1 (1) and database searching, a cDNA clone, GEN-025F07, coding for a protein highly 
homologous to NHP2, a yeast nucleoprotein [Saccharomyces cerevisiae: Kolodrubetz, D. and Burgum, A., YEAST, 7, 
79-90 (1 991)], was found and named OTK27. 
20 Nucleoproteins are fundamental cellular constituents of chromosomes, ribosomes and so forth and are thought to 
play an essential role in cell multiplication and viability. The yeast nucleoprotein NHP2, a high-mobility group (HMG)-like 
protein, like HMG, has reportedly a function essential for cell viability [Kolodrubetz, D. and Burgum, A., YEAST, 7, 79- 
90 (1991)]. 

The novel human gene, OTK27 gene, of the present invention, which is highly homologous to the above-mentioned 
25 yeast NHP2 gene, is supposed to be similar in function. 

The nucleotide sequence of said GEN-025F07 clone was found to comprise 1493 nucleotides, as shown under 
SEQ ID NO:9, and contain an open reading frame comprising 384 nucleotides, as shown under SEQ ID NO:8, coding 
for an amino acid sequence comprising 128 amino acid residues, as shown under SEQ ID NO:7. The initiation codon 
was located at nucleotides 95-97 of the sequence shown under SEQ ID NO:9, and the termination codon at nucleotides 
30 479-481. 

At the amino acid level, the OTK27 protein was highly homologous (38%) to NHP2. It was 83% identical with the 
protein deduced from the cDNA from Arabidopsis thaliana ; 

Newman, T, unpublished; GENEMBL Accession No. T14197). 

35 (2) Northern blot analysis 

For examining the expression of human OTK27 mRNA in normal human tissues, the insert in the OTK27 cDNA was 
amplified by PCR, the PCR product was purified and labeled with [ 32 P]-dCTP (random-primed DNA labeling kit, Boe- 
hringer Mannheim), and Northern blotting was performed using the labeled product as a probe in the same manner as 
40 in Example 1 (2). 

As a result of the Northern blot analysis, two bands corresponding to possible transcripts from this gene were 
detected at approximately 1.6 kb and 0.7 kb. Both sizes of transcript were expressed in all normal adult tissues exam- 
ined. However, the expression of the 0.7 kb transcript was significantly reduced in brain and was of higher levels in 
heart, skeletal muscle and testicle than in other tissues examined. 

45 For further examination of these two transcripts, eleven cDNA clones were isolated from a testis cDNA library and 
their DNA sequences were determined in the same manner as in Example 1 (1). 

As a result, in six clones, the sequences were found to be in agreement with that of the 0.7 kb transcript, with a 
poly(A) sequence starting at around the 600th nucleotide, namely at the 598th nucleotide in two of the six clones, at the 
606th nucleotide in three clones, and at the 613th nucleotide in one done. 

so In these six clones, the TATAAA" sequence was recognized at nucleotides 583-588 as a probable poly(A) signal. 
The upstream poly(A) signal "TATAAA" of this gene was recognized as little influencing in brain and more effective in 
the three tissues mentioned above than in other tissues. The possibility was considered that the stability of each tran- 
script vary from tissue to tissue. 

Results of zoo blot analysis indicated that this gene is well conserved also in other vertebrates. Since this gene is 

55 expressed ubiquitously in normal adult tissues and conserved among a wide range of species, the gene product is likely 
to play an important physiological role. The evidence that yeasts lacking in NHP2 are nonviable suggests that the 
human homolog may also be essential to cell viability. 
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(3) Chromosomal localization of OTK27 by direct R-banding FISH 

One cosmid clone corresponding to the cDNA OTK27 was isolated from a total human genomic cosmid library (5- 
genome equivalent) using the OTK27 cDNA insert as a probe and subjected to FISH in the same manner as in Example 
5 1 (3) for chromosomal localization of OTK27. 

As a result, two distinct spots were observed on the chromosome band 12q24.3. 

The OTK27 gene of the present invention can be used in causing expression thereof and detecting the OTK27 pro- 
tein, a human nucleoprotein, and thus can be utilized in the diagnosis and pathologic studies of various diseases in 
which said protein is involved and, because of its involvement in cell proliferation and differentiation, in screening out 
10 and evaluating therapeutic and preventive drugs for cancer. 

Example 4 

OTK18gene 

15 

(1) OTK18 gene cloning and DNA sequencing 

Zinc finger proteins are defined as constituing a large family of transcription-regulating proteins in eukaryotes and 
carry evolutionally conserved structural motifs [Kadonaga, J. T., et al., Cell, 51, 1079-1090 (1987); Klung, A. and 
20 Rhodes, D., Trends Biol. Sci., 12, 464-469 (1987); Evans, R. M. and Hollenberg, S. M., Cell, 52, 1-3 (1988)]. 

The zinc finger, a loop-like motif formed by the interaction between the zinc ion and two residues, cysteine and his- 
tidine residues, is involved in the sequence-specific binding of a protein to RNA or DNA. The zinc finger motif was first 
identified within the amino acid sequence of the Xenopus transcription factor MIA [Miller, J., et al., EMBO J., 4, 1609- 
1614(1986)]. 

25 The C 2 H 2 finger motif is in general tandemly repeated and contains an evolutionally conserved intervening 
sequence of 7 or 8 amino acids. This intervening stretch was first identified in the Kruppel segmentation gene of Dro- 
soohila [Rosenberg, U. B., et al., Nature, 319. 336-339 (1986)]. Since then, hundreds of C 2 H 2 zinc finger protein-encod- 
ing genes have been found in vertebrate genomes. 

As a result of sequence analysis of cDNA clones arbitrarily selected from a human fetal brain cDNA library in the 

30 same manner as in Example 1(1) and database searching, several zinc finger structure-containing clones were identi- 
fied and, further, a done having a zinc finger structure of the Kruppel type was found. 

Since this clone lacked the 5' portion of the transcript, plaque hybridization was performed with a fetal brain cDNA 
library using, as a probe, an approximately 1 .8 kb insert in the cDNA clone, whereby three clones were isolated. The 
nucleotide sequences of these were determined in the same manner as in Example 1 (1). 

35 Among the three clones, the one having the largest insert spans 3,754 nucleotides including an open reading frame 
of 2,133 nucleotides coding for 71 1 amino acids. It was found that said clone contains a novel human gene coding for 
a peptide highly homologous in the zinc finger domain to those encoded by human ZNF41 and the Drosophila Kruppel 
gene. This gene was named OTK18 gene (derived from the done GEN-076C09). 

The nudeotide sequence of the cDNAclone of the OTK18 gene is shown under SEQ ID NO: 12, the coding region- 

40 containing nucleotide sequence under SEQ ID NO:1 1 , and the predicted amino acid sequence encoded by said OTK18 
gene under SEQ ID NO: 10. 

It was found that the amino acid sequence of OTK18 as deduced from SEQ ID NO:12 contains 13 finger motifs on 
its carboxy side. 

45 (2) Comparison with other zinc finger motif-containing genes 

Comparison among OTK18, human ZNF41 and the Drosophila Kruppel gene revealed that each finger motif is for 
the most part conserved in the consensus sequence CXECGKAFXQKSXLX 2 HQRXH. 

Comparison of the consensus sequence of the zinc finger motifs of OTK18 with those of human ZNF41 and the 
so Drosophila Kruppel gene revealed that the Kruppel type motif is well conserved in the OTK18-encoded protein. How- 
ever, the sequence similarities were limited to zinc finger domains and no significant homologies were found with regard 
to other regions. 

The zinc finger domain interacts specifically with the target DNA, recognizing an about 5 bp sequence to thereby 
bind to the DNA helix [Rhodes, D. and Klug, A., Cell 46. 123-132 (1986)). 
55 Based on the idea that, in view of the above, the multiple module (tandem repetitions of zinc finger) can interact 
with long stretches of DNA, it is presumable that the target DNA of this gene product containing 13 repeated zinc finger 
units would be a DNA fragment with a length of approximately 65 bp. 
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(3) Northern blot analysis 

Northern blot analysis was performed as described in Example 1 (2) for checking normal human tissues for expres- 
sion of the human OTK18 mRNA therein by amplifying the insert of the OTK18 cDNA by PCR, purifying the PCR prod- 
5 uct. labeling the same with [ 32 R]-dCTP (random-primed DNA labeling kit, Boehringer Mannheim) and using an MTN 
blot with the labeled product as a probe. 

The results of Northern blot analysis revealed that the transcript of OTK18 is approximately 4.3 kb long and is 
expressed ubiquitously in various normal adult tissues. However, the expression level in the liver and in peripheral blood 
lymphocytes seemed to be lower than in other organs tested. 

w 

(4) Cosmid cloning and chromosomal localization by direct R-banding FISH 

Chromosomal localization of OTK18 was carried out as described in Example 1 (3). 

As a result, complete twin spots were identified with 8 samples while 23 samples showed an incomplete signal or 
75 twin spots on either or both homologs. All signals appeared at the q13.4 band of chromosome 19. No twin spots were 
observed on any other chromosomes. 

The results of FISH thus revealed that this gene is localized on chromosomal band 19q13.4. This region is known 
to contain many DNA segments that hybridize with oligonucleotides corresponding to zinc finger domains [Hoovers, J. 
M. N., et al., Genomics, 12, 254-263 (1992)]. In addition, at least one other gene coding for a zinc finger domain has 
20 been identified in this region [Marine, J.-C, et al., Genomics, 2! 285-286 (1994)]. 

Hence, the chromosome 19q13 is presumably a site of grouping of multiple genes coding for transcription-regulat- 
ing proteins. 

When the novel human OTK18 gene provided by this example is used, it becomes possible to detect expression of 
said gene in various tissues and produce the human OTK18 protein in the manner of genetic engineering. Through 
25 these, it is possible to analyze the functions of the human transcription regulating protein gene system or human tran- 
scription regulating proteins, which are deeply involved in diverse activities fundamental to cells, as mentioned above, 
to diagnose various diseases with which said gene is associated, for example malformation or cancer resulting from a 
developmental or differentiation anomaly, and mental or nervous disorder resulting from a developmental anomaly in 
the nervous system, and further to screen out and evaluate therapeutic or prophylactic drugs for these diseases. 

30 

Example 5 

Genes encoding human 26S proteasome constituent P42 protein and P27 protein 

35 (1) Cloning and DNA sequencing of genes respectively encoding human 26S proteasome constituent P42 protein and 
P27 protein 

Proteasome, which is a multifunctional protease, is an enzyme occurring widely in eukaryotes from yeasts to 
humans and decomposing ubiquitin-binding proteins in cells in an energy-dependent manner. Structurally, said protea- 

40 some is constituted of 20S proteasome composed of various constituents with a molecular weight of 21 to 31 kilodal- 
tons and a group of PA700 regulatory proteins composed of various constituents with a molecular weight of 30 to 1 12 
kilodaltons and showing a sedimentation coefficient of 22S and, as a whole, occurs as a macromolecule with a molec- 
ular weight of about 2 million daltons and a sedimentation coefficient of 26S [Rechsteiner, M., et al., J. Biol. Chem., 268 . 
6065-6068 (1993); Yoshimura, T, et al., J. Struct. Biol., Hi 200-21 1 (1993); Tanaka, K., et al., New Biologist, 4, 173- 

45 187(1992)]. 

Despite structural and mechanical analyses thereof, the whole picture of proteasome is not yet fully clear. However, 
according to studies using yeasts and mice in the main, it reportedly has the functions mentioned below and its func- 
tions are becoming more and more elucidated. 

The mechanism of energy-dependent proteolysis in ceils starts with selection of proteins by ubiqurtin binding. It is 
so not 20S proteasome but 26S proteasome that has ubiquitin-conjugated protein decomposing activity which is ATP- 
dependent [Chu-Ping et al., J. Biol. Chem., 269, 3539-3547 (1994)]. Hence, human 26S proteasome is considered to 
be useful in elucidating the mechanism of energy-dependent proteolysis. 

Factors involved in the cell cycle regulation are generally short in half-life and in many cases they are subject to 
strict quantitative control. In fact, it has been made clear that the oncogene products Mos, Myc. Fos and so forth can be 
55 decomposed by 26S proteasome in an energy- and ubiquitin-dependent manner [Ishida, N., et al., FEBS Lett., 324. 
345-348 (1993); Hershko, A. and Ciechanover, A., Annu. Rev. Biochem., 6L 761-807 (1992)] and the importance of 
proteasone in cell cycle control is being recognized. 

Its importance in the immune system has also been pointed out. It is suggested that proteasome is positively 
involved in class I major histocompatible complex antigen presentation [Michalek, M. T, et al., Nature, 2S2, 552-554 
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(1993)] and it is further suggested that proteasome may be involved in Alzheimer disease, since the phenomena of 
abnormal accumulation of ubiquitin-conjugated proteins in the brain of patients with Alzheimer disease [Kitaguchi, N., 
et al., Nature, 361. 530-532 (1988)]. Because of its diverse functions such as those mentioned above, proteasome 
attracts attention from the viewpoint of its utility in the diagnosis and treatment of various diseases. 

s A main function of 26S proteasome is ubiquitin-conjugated protein decomposing activity In particular, it is known 
that cell cycle-related gene products such as oncogene products and cyclins, typically c-Myc, are degraded via ubiqui- 
tin-dependent pathways. It has also been observed that the proteasome gene is expressed abnormally in liver cancer 
cells, renal cancer cells, leukemia cells and the like as compared with normal cells [Kanayama, H., et al., Cancer Res., 
51. 6677-6685 (1991)] and that proteasome is abnormally accumulated in tumor cell nuclei. Hence, constituents of pro- 

10 teasome are expected to be useful in studying the mechanism of such canceration and in the diagnosis or treatment of 
cancer. 

Also, it is known that the expression of proteasome is induced by interferon y and so on and is deeply involved in 
antigen presentation in cells [Aki, M., et al., J. Biochem., 115, 257-269 (1994)]. Hence, constituents of human proteas- 
ome are expected to be useful in studying the mechanism of antigen presentation in the immune system and in devel- 
15 oping immunoregulating drugs. 

Furthermore, proteasome is considered to be deeply associated with ubiquitin abnormally accumulated in the brain 
of patients with Alzheimer disease. Hence, it is suggested that constituents of human proteasome should be useful in 
studying the cause of Alzheimer disease and in the treatment of said disease. 

In addition to the utilization of expectedly multifunctional proteasome as such in the above manner, it is probably 
20 possible to produce antibodies using constituents of proteasome as antigens and use such antibodies in diagnosing 
various diseases by immunoassay. Its utility in this field of diagnosis is thus also a focus of interest. 

Meanwhile, a protein having the characteristics of human 26S proteasome is disclosed, for example in Japanese 
Unexamined Patent Publication No. 292964/1993 and rat proteasome constituents are disclosed in Japanese Unexam- 
ined Patent Publication Nos. 268957/1993 and 317059/1993. However, no human 26S proteasome constituents are 
25 known. Therefore, the present inventors made a further search for human 26S proteasome constituents and success- 
fully obtained two novel human 26S proteasome constituents, namely human 26S proteasome constituent P42 protein 
and human S26 proteasome constituent P27 protein, and performed cloning and DNA sequencing of the corresponding 
genes in the following manner. 

30 (1) Purification of human 26S proteasome constituents P42 protein and P27 protein 

Human proteasome was purified using about 100 g of fresh human kidney and following the method of purifying 
human proteasome as described in Japanese Unexamined Patent Publication No. 292964/1993, namely by column 
chromatography using BioGel A-1.5 m (5 x 90 cm, Bio-Rad), hydroxyapatite (1.5 x 15 cm, Bio-Rad) and Q-Sepharose 
35 (1 .5 x 15 cm, Pharmacia) and glycerol density gradient centrifugation. 

The thus-obtained human proteasome was subjected to reversed phase high performance liquid chromatography 
(HPLC) using a Hitachi model L6200 HPLC system. A Shodex RS Pak D4-613 (0.6 x 15 cm, Showa Denko) was used 
and gradient elution was performed with the following two solutions: 

40 First solution: 0.06% trifluoroacetic acid; 

Second solution: 0.05% trifluoroacetic acid, 70% acetonitrile. 

An aliquot of each eluate fraction was subjected to 8.5% SDS-polyacrylamide electrophoresis under conditions of 
reduction with dithiothreitol. The P42 protein and P27 protein thus detected were isolated and purified. 
45 The purified P42 and P27 proteins were respectively digested with 1 fig of trypsin in 0.1 M Tris buffer (pH 7.8) con- 
taining 2 M urea at 37°C for 8 hours and the partial peptide fragments obtained were separated by reversed phase 
HPLC and their sequences were determined by Edman degradation. The results obtained are as shown below in Table 
2. 

50 
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Table 2 



Partial protein 


Amino acid sequence 


P42 


0) 


VLNISLW 




(2) 


TLMELLNQMDGFDTLHR 




(3) 


AVSDFWSEYXMXA 




(4) 


EVDPLVYNX 




(5) 


HGEIDYEAIVK 




(6) 


LSXGFNGADLRNVXTEAGMFAIXAD 




(7) 


MIMATNPiPDTLDPALLRPGXL 




(8) 


IHIDLPNEQARLDILK 




(9) 


ATNGP RYVWG 




(10) 


EIDGRLK 




(11) 


ALQSVGQfVGEVLK 




(12) 


ILAGPITK 




(13) 


XXVIELPLTNPELFQG 




(14) 


WSSSLVDK 




(15) 


ALQDYRK 




(16) 


EHREQLK 




(17) 


KLESKLDYKPVR 


P27 


0) 


LVPTR 




(2) 


AKEEEIEAQIK 




(3) 


ANYEVLESQK 




(4) 


VEDALHQLHAR 




(5) 


DVDLYQVR 




(6) 


QSQGLSPAQAFAK 




(7) 


AGSQSGGSP EASGVTVSDVQE 




(8) 


GLLGXNIIPLQR 



45 (2) cDNA library screening, clone isolation and cDNA nucleotide sequence determination 

As mentioned in Example 1(1), the present inventors have a database comprising about 30,000 cDNA data as con- 
structed based on large-scale DNA sequencing using human fetal brain, arterial blood vessel and placenta cDNA librar- 
ies. 

so Based on the amino acid sequences obtained as mentioned above in (1), computer searching was performed with 
the FASTA program (search for homology between said amino acid sequences and the amino acid sequences esti- 
mated from the database). As regards P42, a clone (GEN-331G07) showing identity with regard to two amino acid 
sequences [(2) and (7) shown in table 2] was screened out and, as regards P27, a clone (GEN-163D09) showing iden- 
tity with regard to two amino acid sequences [(1) and (8) shown in Table 2] was found. 

55 For each of these clones, the 5' side sequence was determined by 5' RACE and the whole sequence was deter- 
mined, in the same manner as in Example 2 (2). 

As a result, it was revealed that the above-mentioned P42 done GEN-331G07 comprises a 1 ,566-nucleotide 
sequence as shown under SEQ ID NO:15, inclusive of a 1 ,1 67-nucleotide open reading frame as shown under SEQ ID 
NO: 14, and that the amino acid sequence encoded thereby is the one shown under SEQ ID NO: 13 and comprises 389 
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amino acid residues. 

The results of computer homology search revealed that the P42 protein is significantly homologous to the AAA 
(ATPase associated with a variety of cellular activities) protein family (e.g. P45, TBP1, TBP7, S4, MSS1, etc.). It was 
thus suggested that it is a new member of the AAA protein family. 

5 As for the P27 clone GEN-1 63D09, it was revealed that it comprises a 1 , 1 28-nucleotide sequence as shown under 
SEQ ID NO:18, including a 669-nucleotide open reading frame as shown under SEQ ID NO:17 and that the amino acid 
sequence encoded thereby is the one shown under SEQ ID NO:16 and comprises 223 amino acid residues. 

As regards the P27 protein, homology search using a computer failed to reveal any homologous gene among public 
databases. Thus, the gene in question is presumably a novel gene having an unknown function. 

w Originally, the above-mentioned P42 and P27 gene products were both purified as regulatory subunit components 
of proteasome complex. Therefore, these are expected to play an important role in various biological functions through 
proteolysis, for example a role in energy supply through decomposition of ATP and, hence, they are presumably useful 
not only in studying the function of human 26S proteasome but also in the diagnosis and treatment of various diseases 
caused by lowering of said biological functions, among others. 

15 

Example 6 
BNAP gene 

20 (1) BNAP gene cloning and DNA sequencing 

The nucleosome composed of DNA and histone is a fundamental structure constituting chromosomes in eukaryotic 
cells and is well conserved over borders among species. This structure is closely associated with the processes of rep- 
lication and transcription of DNA. However, the nucleosome formation is not fully understood as yet. Only certain spe- 

25 cific factors involved in nucleosome assembly (NAPs) have been identified. Thus, two acidic proteins, nucleoplasms 
and N1 , are already known to facilitate nucleosome construction [Kleinschmidt, J. A., et al., J. Biol. Chem., 260. 1 166- 
1176 (1985); Dilworth, S. M., et al., Cell 51 1009-1018 (1987)]. 

A yeast gene, NAP-I, was isolated using a monoclonal antibody and recombinant proteins derived therefrom were 
tested as to whether they have nucleosome assembling activity in vivo. 

30 More recently, a mouse NAP-I gene, which is a mammalian homolog of the yeast NAP-I gene was cloned (Okuda, 
A.; registered in database under the accession number D12618). Also cloned wereamouse gene, DN38 [Kato, K. f Eur. 
J. Neurosci., 2, 704-71 1 (1990)] and a human nucleosome assembly protein (hNRP) [Simon, H. U., et al., Biochem. J., 
297. 389-397 (1994)]. It was shown that the hNRP gene is expressed in many tissues and is associated with T lym- 
phocyte proliferation. 

35 The present inventors performed sequence analysis of cDNA clones arbitrarily chosen from a human fetal brain 
cDNA library in the same manner as in Example 1 (1), followed by searches among databases and, as a result, made 
it clear that a 1,125-nucleotide cDNA clone (free of poly(A)), GEN-078D05, is significantly homologous to the mouse 
NAP-I gene, which is a gene for a nucleosome assembly protein (NAP) involved in nucleosome construction, a mouse 
partial cDNA clone, DN38, and hNRP. 

40 Since said clone GEN-078D05 was lacking in the 5' region, 5' RACE was performed in the same manner as in 
Example 2 (2) to obtain the whole coding region. For this 5' RACE, primers P1 and P2 respectively having the nucle- 
otide sequences shown below in Table 3. 



Table 3 



Primer 


Nucleotide sequence 


Primer P1 


S'-TTGAAGAATGATGCATTAGGAACCAC-S' 


Primer P2 


5'-CACTCGAGTGGCTGGATTTCAATTTCTCCAGTAG-3' 



After the first 5' RACE, a single band corresponding to a sequence length of 1 ,300 nucleotides was obtained. This 
product was inserted into pT7Blue(R) T- Vector and several clones appropriate in insert size were selected. 
55 Ten 5* RACE clones obtained from two independent PCR reactions were sequenced and the longest clone GEN- 
078D05TA13 (about 1,300 nucleotides long) was further analyzed. 

Both strands of the two overlapping cDNA clones GEN-078D05 and GEN-078D05TA13 were sequenced, whereby 
it was confirmed that the two clones did not yet cover the whole coding region. Therefore, a further second 5' RACE 
was carried out. For the second 5' RACE, two primers, P3 and P4, respectively having the sequences shown below in 



15 



Table 4 were used. 
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Table 4 



Primer 


Nucleotide sequence 


Primer P3 


5'-GTCGAGCTAGCCATCTCCTCTTCG-3* 


Primer P4 


5'-CATGGGCGACAGGTTCCGAGACC-3' 



10 

A clone, GEN-078D0508, obtained by the second 5' RACE was 300 nucleotides long. This clone contained an esti- 
mable initiation codon and three preceding in-frame termination codons. From these three overlapping clones, it 
became clear that the whole coding region comprises 2,636 nucleotides. This gene was named brain-specific nucleo- 
15 some assembly protein (BNAP) gene. 

The BNAP gene contains a 1,518-nucleotide open reading frame shown under SEQ ID NO:20. The amino acid 
encoded thereby comprises 506 amino acid residues, as shown under SEQ ID NO:19, and the nucleotide sequence of 
the whole cDNA clone of BNAP is as shown under SEQ ID NO:21 . 

As shown under SEQ ID NO:21 , the 5* noncoding region of said gene was found to be generally rich in GC. Candi- 
20 date initiation codon sequences were found at nucleotides Nos. 266-268, 287-289 and 329-331. These three 
sequences all had well conserved sequences in the vicinity of the initiation codons [Kozak, M., J. Biol. Chem., 266. 
19867-19870(1991)]. 

According to the scanning model, the first ATG (nucleotides Nos. 266-268) of the cDNA clone may be the initiation 
codon. The termination codon was located at nucleotides Nos. 1784-1786. 
25 The 3' noncoding redion was generally rich in AT and two polyadenylation signals (AATAAA) were located at nucle- 
otides Nos. 2606-261 1 and 2610-2615, respectively. 

The longest open reading frame comprised 1 ,518 nucleotides coding for 506 amino acid residues and the calcu- 
lated molecular weight of the BNAP gene product was 57,600 daltons. 

Hydrophilic plots indicated that BNAP is very hydrophilic, like other NAPs. 
30 For recombinant BNAP expression and purification and for eliminating the possibility that the BNAP gene sequence 
might give three chimera clones in the step of 5' RACE, RT-PCR was performed using a sequence comprising nucle- 
otides Nos. 326-356 as a sense primer and a sequence comprising nucleotides Nos. 1758-1786 as an antisenses 
primer. 

As a result, a single product of about 1,500 bp was obtained and it was thus confirmed that said sequence is not a 
35 chimera but a single transcript. 

(2) Comparison between BNAP and NAPs 

The amino acid sequence deduced from BNAP showed 46% identity and 65% similarity to hNRR 
40 The deduced BNAP gene product had motifs characteristic of the NAPs already reported and of BNAP. In general, 
half of the C terminus was well conserved in humans and yeasts. 

The first motif (domain I) is KGIPDYWLI (corres ponding to amino acid residues Nos. 309-31 7). This was observed 
also in hNRP (KGIPSFWLT) and in yeast NAP-I (KGIPEFWLT). 

The second motif (domain II) is ASFFNFFSPP (corresponding to amino acid residues Nos. 437-446) and this was 
45 expressed as DSFFNFFAPP in hNRP and as ESFFNFFSP in yeast NAP-I. 

These two motifs were also conserved in the deduced mouse NAP-I and DN38 peptides. Both conserved motifs 
were each a hydrophilic cluster, and the Cys in position 402 was also found conserved. 

Half of the N terminus had no motifs strictly conserved from yeasts to mammalian species, while motifs conserved 
among mammalian species were found. 
so For instance, HDLERKYA (corresponding to amino acid residues Nos. 130 to 137) and IINAEYEPTEEECEW (cor- 
responding to amino acid residues Nos. 150-164), which may be associated with mammal-specific functions, were 
found strictly conserved. 

NAPs had acidic stretches, which are believed to be readily capable of binding to histone or other basic proteins. 
All NAPs had three acidic stretches but the locations thereof were not conserved. 
55 BNAP has no such three acidic stretches but, instead, three repeated sequences (corresponding to amino acid res- 
idues Nos. 194-207, 208-221 and 222-235) with a long acidic cluster, inclusive of 41 amino acid residues out of 98 
amino acid residues, the consensus sequence being ExxKExPEVKxEEK (each x being a nonconserved, mostly hydro- 
phobic, residue). 

Furthermore, it was revealed that the BNAP sequence had several BNAP-specif ic motifs. Thus, an extremely ser- 



16 



EP0 796 913 A2 



ine-rich doamin (corresponding to amino acid residues Nos. 24-72) with 33 (67%) of 49 amino acid residues being ser- 
ine residues was found in the N-terminus portion. On the nucleic acid level, they were reflected as incomplete 
repetitions of AGC. 

Following this serine-rich region, there appeared a basic domain (corresponding to amino acid residues Nos. 71- 
5 89) comprising 1 0 basic amino acid residues among 1 9 residues. 

BNAP is supposed to be localized in the nucleus. Two possible signals localized in the nucleus were observed 
(NLSs). The first signal was found in the basic domain of BNAP and its sequence YRKKR (corresponding to amino acid 
residues Nos. 75-79) was similar to NLS (GRKKR) of Tat of HIV-1 . The second signal was located in the C terminus and 
its sequence KKYRK (corresponding to amino acid residues Nos. 502-506) was similar to NLS (KKKRK) of the large T 
10 antigen of SV40. The presence of these two presumable NLSs suggested the localization of BNAP in the nucleus. How- 
ever the possibility that other basic clusters might act as NLSs could not be excluded. 

BNAP has several phosphorylation sites and the activity of BNAP may be controlled through phosphorylation 
thereof. 

15 (3) Northern blot analysis 

Northern blot analysis was performed as described in Example 1 (2). Thus, the clone GEN-078D05TA13 (corre- 
sponding to nucleotides Nos. 323 to 1558 in the BNAP gene sequence) was amplified by PCR, the PCR product was 
purified and labeled with [ 32 P]-dCTP (random-primed DNA labeling kit, Boehringer Mannheim), and the expression of 
20 BNAP mRNA in normal human tissues was examined using an MTN blot with the labeled product as a probe. 

As a result of Northern blot analysis, a 3.0 kb transcript of BNAP was detected (8-hour exposure) in the brain 
among eight human adult tissues tested, namely heart, brain, placenta, lung, liver, skeletal muscle, kidney and pan- 
creas and, after longer exposure (24 hours), a dim band of the same size was detected in the heart. 

BNAP was found equally expressed in several sites of brain tested whereas, in other tissues, no signal was 
25 detected at all even after 72 hours of exposure. hNRP mRNA was found expressed everywhere in the human tissues 
tested whereas the expression of BNAP mRNA was tissue-specific. 

(4) Radiation hubrid mapping 

30 Chromosomal mapping of the BNAP clone was performed by means of radiation hibrid mapping [Cox, D. R., et a!., 
Science, 250, 245-250 (1 990)]. 

Thus, a total human genome radiation hybrid clone (G3RH) panel was purchased from Research Genetics, Inc., 
AL, USA and PCR was carried out for chromosomal mapping analysis according to the product manual using two prim- 
ers, A1 and A2, respectively having the nucleotide sequences shown in Table 5. 

35 



Table 5 



Primer 


Nucleotide sequence 


A1 primer 


5'-CCTAAAAAGTGTCTAAGTGCCAGTT-3' 


A2 primer 


5'-TCAGTGAAAGGGAAGGTAGAACAC-3' 



45 The results obtained were analyzed utilizing softwares usable on the Internet [Boehnke, M., et al., Am. J. Hum. 
Genet, 46, 581-586 (1991)]. 

As a result, the BNAP gene was found strongly linked to the marker DXS990 (LOD = 1000, CR8000 = -0.00). Since 
DXS990 is a marker localized on the chromosome Xq21.3-q22, it was established that BNAP is localized to the chro- 
mosomal locus Xq21.3-q22 where genes involved in several signs or symptoms of X-chromosome-associated mental 

so retardation are localized. 

The nucleosome is not only a fundamental chromosomal structural unit characteristic of eukaryotes but also a gene 
expression regulating unit. Several results indicate that genes with high transcription activity are sensitive to nuclease 
treatment, suggesting that the chromosome structure changes with the transcription activity [Elgin, S. C. R., J. Biol. 
Chem., 263, 19259-19262 (1988)]. 

55 NAP-I has been cloned in yeast, mouse and human and is one of the factors capable of promoting nucleosome 
construction in vivo . In a study performed on their sequences, NAPs containing the epitope of the specific antibody 4A8 
were detected in human, mouse, frog, Drosophila and veast f Saccharomvces cerevisiae) [Ishimi, Y, et al., Eur. J. Bio- 
Chem.,l£2, 19-24(1987)]. 

In these experiments, NAPs, upon SDS-PAGE analysis, electrophoretically migrated to positions corresponding to 
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a molecular weight between 50 and 60 kDa, whereas the recombinant BNAP slowly migrated to a position of about 80 
kDa. The epitope of 4A8 was shown to be localized in the second, well-conserved, hydrophobic motif. And, it was simul- 
taneously shown that the triplet FNF is important as a part of the epitope [Fujii-Nakata, T. f et aL, J. Biol. Chem., 267. 
20980-20986(1992)]. 

5 BNAP also contained this consensus motif in domain II. The fact that domain II is markedly hydrophobic and the 
fact that domain II can be recognized by the immune system suggest that it is probably presented on the BNAP surface 
and is possibly involved in protein-protein interactions. 

Domain I, too, may be involved in protein-protein interactions. Considering that these are conserved generally 
among NAPs, though to a relatively low extent, it is conceivable that they must be essential for nudeosome construc- 
10 tion, although the functional meaning of the conserved domains is still unknown. 

The hNRP gene is expressed in thyroid gland, stomach, kidney, intestine, leukemia, lung cancer, mammary cancer 
and so on [Simon, H. U., et al., Biochem. J., 297, 389-397 (1994)]. Like that, NAPs are expressed everywhere and are 
thought to be playing an important role in fundamental nudeosome formation. 

BNAP may be involved in brain-specific nudeosome formation and an insufficiency thereof may cause neurological 
is diseases or mental retardation as a result of deviated functions of neurons. 

BNAP was found strongly linked to a marker on the X-chromosome q2 1 .3-q22 where sequences involved in several 
symptoms of X-chromosome-associated mental retardation are localized. This center-surrounding region of X-chromo- 
some was rich in genes responsible for a-thalassemia, mental retardation (ATR-X) or some other forms of mental retar- 
dation [Gibbons, R. J., et al., Cell, 80, 837-845 (1995)]. Like the analysis of the ATR-X gene which seems to regulate 
20 the nudeosome structure, the present inventors suppose that BNAP may be involved in a certain type of X-chromo- 
some-linked mental retardation. 

According to this example, the novel BNAP gene is provided and, when said gene is used, it is possible to detect 
the expression of said gene in various tissues and to produce the BNAP protein by the technology of genetic engineer- 
ing. Through these, it is possible to study the brain nudeosome formation deeply involved, as mentioned above, in var- 
25 iegated activities essential to cells as well as the functions of cranial nerve cells and to diagnose various neurological 
diseases or mental retardation in which these are involved and screen out and evaluate drugs for the treatment or pre- 
vention of such diseases. 

Example 7 

30 

Human skeletal muscle-specific ubiquitin-conjugating enzyme gene (UBE2G gene) 

The ubiquitin system is a group of enzymes essential for cellular processes and is conserved from yeast to human. 
Said system is composed of ubiquitin-activating enzymes (UBAs), ubiquitin-conjugating enzymes (UBCs), ubiquitin 
35 protein ligases (UBRs) and 26S proteasome partides. 

Ubiquitin is transferred from the above-mentioned UBAs to several UBCs, whereby it is activated. UBCs transfer 
ubiquitins to target proteins with or without the participation of UBRs. These ubiquitin-conjugated target proteins are 
said to induce a number of cellular responses, such as protein degradation, protein modification, protein translocation, 
DNA repair, cell cycle control, transcription control, stress responses, etc. and immunological responses [Jentsch, S., 
40 et al., Biochim. Biophys. Acta, 1089. 127-139 (1991); Hershko, A. and Ciechanover, A., Annu. Rev. Biochem., £L 761- 
807 (1992); Jentsch, S., Annu. Rev. Genet., 26, 179-207 (1992); Ciechanover, A., Cell 79, 13-21 (1994)]. 

UBCs are key components of this system and seem to have distinct substrate specificities and modulate different 
functions. For example, Saccharomyces cerevisiae UBC7 is induced by cadmium and involved in resistance to cad- 
mium poisoning [Jungmann, J., et al., Nature, 361, 369-371 (1993)]. Degradation of MAT-a2 is also executed by UBC7 
45 and UBC6 [Chen, R, et al., Cell, 74, 357-369 (1993)]. 

The novel gene obtained in this example is UBC7-like gene strongly expressed in human skeletal musde. In the 
following, cloning and and DNA sequencing thereof are described. 

(1) Cloning and DNA sequencing of human skeletal muscle-specific ubiquitin-conjugating enzyme gene (UBE2G gene) 

50 

Following the same procedure as in Example 1 (1), cDNA clones were arbitrarily selected from a human fetal brain 
cDNA library and subjected to sequence analysis, and database searches were performed. As a result, a cDNA done, 
GEN-423A12, was found to have a significantly high level of homology to the genes coding for ubiquitin-conjugating 
enzymes (UBCs) in various species. 
55 Since said GEN-423A1 2 clone was lacking in the 5' side, 5' RACE was performed in the same manner as in Exam- 
ple 2 (2) to obtain an entire coding region. 

For said 5' RACE, two primers, P1 and P2, respectively having the nucleotide sequences shown in Table 6 were 
used. 
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Table 6 



Primer 


Nucleotide sequence 


P1 primer 


S'-TAATGAAI I ICAI I I1AGGAGGTGGG-3' 


P2 primer 


5'-ATCTTTTGGGAAAGTAAGATGAGCC-3' 



10 

The 5' RACE product was inserted into pT7Blue(R) T- Vector and clones with an insert proper in size were selected. 
Four of the 5' RACE clones obtained from two independent PCR reactions contained the same sequence but were 
different in length. 

By sequencing the above clones, the coding sequence and adjacent 5'- and 3-flanking sequences of the novel 
15 gene were determined. 

As a result, it was revealed that the novel gene has a total length of 61 7 nucleotides. This gene was named human 
skeletal muscle-specific ubiquitin-conjugating enzyme gene (UBE2G gene). 

To exclude the conceivable possibility that this sequence was a chimera clone, RT-PCR was performed in the same 
manner as in Example 6 (1) using the sense primer to amplify said sequence from the human fetal brain cDNA library. 
20 As a result, a single PCR product was obtained, whereby it was confirmed that said sequence is not a chimera one. 

the UBE2G gene contains an open reading frame of 510 nucleotides, which is shown under SEQ ID NO:23, the 
amino acid sequence encoded thereby comprises 170 amino acid residues, as shown under SEQ ID NO:22, and the 
nucleotide sequence of the entire UBE2G cDNA is as shown under SEQ ID NO:24. 

As shown under SEQ ID NO:24, the estimable initiation codon was located at nucleotides Nos. 19-21 , correspond- 
25 ing to the first ATG triplet of the cDNA clone. Since no preceding in-frame termination codon was found, it was deduced 
that this clone contains the entire open reading frame on the following grounds. 

Thus, (a) the amino acid sequence is highly homologous to S. cerevisiae UBC7 and said initiation codon agrees 
with that of yeast UBC7, supporting said ATG as such, (b) The sequence AGGATGA is similar to the consensus 
sequence (A/G)CCATGG around the initiation codon [Kozak, M., J. Biol. Chem., 266, 19867-19870 (1991)]. 

30 

(2) Comparison in amino acid sequence between UBE2G and UBCs 

Comparison in amino acid sequence between UBE2G and UBCs suggested that the active site cystein capable of 
binding to ubiquitin should be the 90th residue cystein. The peptides encoded by these genes seem to belong to the 
35 same family. 

(3) Northern blot analysis 

Northern blot analysis was carried out as described in Example 1 (2). Thus, the entire sequence of UBE2G was 
40 amplified by PCR, the PCR product was purified and labeled with [ 32 P]-dCTP (random-primed DNA labeling kit, Boe- 
hringer Mannheim) and the expression of UBE2G mRNA in normal human tissues using the labeled product as a probe. 
The membrane used was an MTN blot 

As a result of the Northern blot analysis, 4.4 kb, 2.4 kb and 1 .6 kb transcripts could be detected in all 16 human 
adult tissues, namely heart, brain, placenta, lung, liver, skeletal muscle, kidney, pancreas, spleen, thyroid gland, urinary 
45 bladder, testis, ovary, small intestine, large intestine and peripheral blood leukocye, after 18 hours of exposure. Strong 
expression of these transcripts was observed in skeletal muscle. 

(4) Radiation hybrid mapping 

so Chromosomal mapping of the UBE2G clone was performed by radiation hybrid mapping in the same manner as in 
Example 6 (4). 

The primers C1 and C4 used in PCR for chromosomal mapping analysis respectively correspond to nucleotides 
Nos. 415-435 and nucleotides Nos. 509-528 in the sequence shown under SEQ ID NO:24 and their nucleotide 
sequences are as shown below in Table 7. 

55 
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Table 7 



Primer 


Nucleotide sequence 


C1 primer 


5'-GGAGACTCACCTGCTAATGTT-3' 


C4 primer 


5'-CTCAAAAGCAGTCTCTTGGC-3' 



10 

As a result, the UBE2G gene was found linked to the markers D1 S446 (LOD = 12.52, cR8000 = 8.60) and D1 S235 
(LOD = 9.14, cR8000 = 22.46). These markers are localized to the chromosome bands 1q42.13-q42.3. 

UBE2G was expressed strongly in skeletal muscle and very weakly in all other tissues examined. All other UBCs 
are involved in essential cellular functions, such as cell cycle control, and those UBCs are expressed ubiquitously. How- 

15 ever, the expression pattern of UBE2G might suggest a muscle-specific role thereof. 

While the three transcripts differing in size were detected, attempts failed to identify which corresponds to the cDNA 
clone. The primary structure of the UBE2G product showed an extreme homology to yeast UBC7. On the other hand, 
nematode UBC7 showed strong homology to yeast UBC7. It is involved in degradation of the repressor and further con- 
fers resistance to cadmium in yeasts. The similarities among these proteins suggest that they belong to the same family. 

20 It is speculated that UBE2G is involved in degradation of muscle-specific proteins and that a defect in said gene 
could lead to such diseases as muscular dystrophy. Recently, another proteolytic enzyme, calpain 3, was found to be 
responsible for limb-girdle muscular dystrophy type 2A [Richard, I., et al., Cell, 3L 27-40 (1995)]. At the present, the 
chromosomal location of UBE2G suggests no significant relationship with any hereditary muscular disease but it is 
likely that a relation to the gene will be unearthed by linkage analysis in future. 

25 In accordance with this example, the novel UBE2G gene is provided and the use of said gene enables detection of 
its expression in various tissues and production of the UBE2G protein by the technology of genetic engineering. 
Through these, it becomes possible to study the degradation of muscle-specific proteins deeply involved in basic activ- 
ities variegated and essential to cells, as mentioned above, and the functions of skeletal muscle, to diagnose various 
muscular diseases in which these are involved and further to screen out and evaluate drugs for the treatment and pre- 

30 vention of such diseases. 

Example 8 
TMP-2 gene 

35 

(1) TMP-2 gene cloning and DNA sequencing 

Following the procedure of Example 1 (1), cDNA clones were arbitrarily selected from a human fetal brain cDNA 
library and subjected to sequence analysis, and database searches were performed. As a result, a clone (GEN- 
40 092E1 0) having a cDNA sequence highly homologous to a transmembrane protein gene (accession No.: U1 9878) was 
found out. 

Membrane protein genes have so far been cloned in frog (Xenopus laevis) and human. These are considered to be 
a gene for a transmembrane type protein having a follistatin module and an epidermal growth factor (EGF) domain 
(accession No.: U 19878). 

45 The sequence information of the above protein gene indicated that the GEN-092E10 clone was lacking in the 5* 
region, so that the Agt1 0 cDNA library (human fetal brain 5'-STRETCH PLUS cDN A; Clontech) was screened using the 
GEN-092E10 clone as a probe, whereby a cDNA clone containing a further 5' upstream region was isolated. 

Both strands of this cDN A clone were sequenced, whereby the sequence covering the entire coding region became 
clear. This gene was named TMP-2 gene. 

so The TMP-2 gene was found to contain an open reading frame of 1,122 nucleotides, as shown under SEQ ID 
NO:26, encoding an amino acid sequence of 374 residues, as shown under SEQ ID NO:25. The nucleotide sequence 
of the entire TMP-2 cDNA clone comprises 1,721 nucleotides, as shown under SEQ ID NO:27. 

As shown under SEQ ID NO:27, the 5' noncoding region was generally rich in GC. Several candidates for the initi- 
ation codon were found but, according to the scanning model, the 5th ATG of the cDNA clone (bases Nos. 368-370) 

55 was estimated as the initiation codon. The termination codon was located at nucleotides Nos. 1490-1492. The polya- 
denylation signal (AATAAA) was located at nucleotides Nos. 1 703-1708. The calculated molecular weight of the TMP- 
2 gene product was 41 ,400 dattons. 

As mentioned above, the transmembrane genes have a follistatin module and an EGF domain. These motifs were 
also found conserved in the novel human gene of the present invention. 
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The TMP-2 gene of the present invention presumably plays an important role in cell proliferation or intercellular 
communication, since, on the amino acid level, said gene shows homology, across the EGF domain, to TGF-a (trans- 
forming growth factor-a; Derynck, R., et al., Cell 32, 287-297 (1984)], beta-cellulin [Igarashi, K. and Folkman, J., Sci- 
ence, 259, 1604-1607 (1993)], heparin-binding EGF-like growth factor [Higashiyama, S., et al., Science, 25L 936-939 
5 (1991)] and schwannoma-derived growth factor [Klmura, H., et al., Nature, 348, 257-260 (1990)]. 

(2) Northern blot analysis 

Northern blot analysis was carried out as described in Example 1 (2). Thus, the clone GEN-092E10 was amplified 
10 by PCR. the PCR product was purified and labeled with [ 32 P]-dCTP (random-primed DNA labeling kit, Boehringer Man- 
nheim), and the expression of TMP-2 mRNA in normal human tissues was examined using an MTN blot with the labeled 
product as a probe. 

As a result, high levels of expression were detected in brain and prostate gland. Said TMP-2 gene mRNA was 
about 2 kb in size. 

75 According to the present invention, the novel human TMP-2 gene is provided and the use of said gene makes it 
possible to detect the expression of said gene in various tissues or produce the human TMP-2 protein by the technology 
of genetic engineering and, through these, it becomes possible to study brain tumor and prostatic cancer, which are 
closely associated with cell proliferation or intercellular communication, as mentioned above, to diagnose these dis- 
eases and to screen out and evaluate drugs for the treatment and prevention of such diseases. 

20 

Example 9 
Human NPIK gene 
25 (1) Human NPIK gene cloning and DNA sequencing 

Following the procedures of Example 1 and Example 2, cDNA clones were arbitrarily selected from a human fetal 
brain cDNA library and subjected to sequence analysis, and database searches were performed. As a result, two cDN A 
clones highly homologous to the gene coding for an amino acid sequence conserved in phosphatidylinositol 3 and 4 
30 kinases [Kunz, J., et al., Cell, 73, 585-596 (1993)] were obtained. These were named GEN-428B12c1 and GEN- 
428B12c2 and the entire sequences of these were determined as in the foregoing examples. 

As a result, the GEN-428B12c1 cDNA clone and the GEN-428B12c2 clone were found to have coding sequences 
differing by 12 amino acid residues at the 5' terminus, the GEN-428B12c1 cDNA clone being longer by 12 amino acid 
residues. 

35 The GEN-428B12c1 cDNA sequence of the human NPIK gene contained an open reading frame of 2,487 nucle- 
otides, as shown under SEQ ID NO:32, encoding an amino acid sequence comprising 829 amino acid residues, as 
shown under SEQ ID NO:31 . The nucleotide sequence of the full-length cDNA clone comprised 3,324 nucleotides as 
shown under SEQ ID NO:33. 

The estimated initiation codon was located, as shown under SEQ ID NO:33, at nucleotides Nos. 115-117 corre- 
40 sponding to the second ATG triplet of the cDNA clone. The termination codon was located at nucleotides Nos. 2602- 
2604 and the polyadenylation signal (AATAAA) at Nos. 3305-331 0. 

On the other hand, the GEN-428B1 2c2 cDNA sequence of the human NPIK gene contained an open reading frame 
of 2,451 nucleotides, as shown under SEQ ID NO:29. The amino acid sequence encoded thereby comprised 817 
amino acid residues, as shown under SEQ ID NO:28. The nucleotide sequence of the full-length cDNA clone comprised 
45 3,602 nucleotides, as shown under SEQ ID NO:30. 

The estimated initiation codon was located, as shown under SEQ ID NO:30, at nucleotides Nos. 429-431 corre- 
sponding to the 7th ATG triplet of the cDNA clone. The termination codon was located at nucleotides Nos. 2880-2882 
and the polyadenylation signal (AATAAA) at Nos. 3583-3588. 

so (2) Northern blot analysis 

Northern blot analysis was carried out as described in Example 1 (2). Thus, the entire sequence of human NPIK 
was amplified by PCR, the PCR product was purified and labeled with [ 32 P]-dCTP (random-primed DNA labeling kit, 
Boehringer Mannheim), and normal human tissues were examined for expression of the human NPIK mRNA using the 
55 MTN blot membrane with the labeled product as a probe. 

As a result, the expression of the human NPIK gene was observed in 16 various human adult tissues examined 
and an about 3.8 kb transcript and an about 5 kb one could be detected. 

Using primer A having the nucleotide sequence shown below in Table 8 and containing the initiation codon of the 
GEN-428B12c2 cDNA and primer B shown in table 8 and containing the termination codon, PCR was performed with 
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Human Fetal Brain Marathon- Ready cDNA (Clontech) as a template, and the nucleotide sequence of the PCR product 
was determined. 



Table 8 



Primer 


Nucleotide sequence 


Primer A 


5'-ATGGGAGATACAGTAGTGGAGC-3' 


Primer B 


5*-TCACATGATGCCGTTGGTGAG-3 f 



As a result, it was found that the human NPIK mRNA expressed included one lacking in nucleotides Nos. 1060- 
1 104 of the GEN-428B12c1 cDNA sequence (SEQ ID NO:33) (amino acids Nos. 316-330 of the amino acid sequence 
75 under SEQ ID NO:31) and one lacking in nucleotides Nos. 1897-191 1 of the GEN-428B12c1 cDNA sequence (SEQ ID 
NO:33) (amino acids Nos. 595-599 of the amino acid sequence under SEQ ID NO:31). 

It was further revealed that polymorphism existed in this gene (428B1 2c1 .fasta), as shown below in Table 9, in the 
region of bases Nos. 1941-1966 of the GEN-428B12c1 cDNA sequence shown under SEQ ID NO:33 ( whereby a 
mutant protein was encoded which resulted from the mutation of IQDSCEITT (amino acid residues Nos. 610-618 in the 
20 amino add sequence (SEQ ID NO:31) encoded by GEN-428B12c1) into YKILVISA. 



Table 9 

25 1930 1940 1950 1959 

TCGATCAAGCCAATACAACAnCTTGTGAA 
lllllllllll lltlliiilfllllll 
TCCATTTGGGAACAGGAGCGAGTGCCCCTTTCGATCAAGCC-ATACAAGATTCTTGTG — 
1900 1910 1920 1930 1940 1950 
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1960 1970 1980 

ATTACGACTGATACTGGCATC 
IN II llllif llllllli 

ATTTCGGCTCATAGTGGCATGATtCAACCAGTGGTCAATGCTGTGTCCATCCATCAGGTG 
1960 1970 19*0 1990 2000 2010 



40 

(3) Chromosomal mapping of human NPIK gene by FISH 

Chromosomal mapping of the human NPIK gene was carried out by FISH as described in Example 1 (3). 
As a result, it was found that the locus of the human NPIK gene is in the chromosomal position 1q21 .1 -q21 .3. 
45 The human NPIK gene, a novel human gene, of the present invention included two cDNAs differing in the 5' region 
and capable of encoding 829 and 81 7 amino acid residues, as mentioned above. In view of this and further in view of 
the findings that the mRNA corresponding to this gene includes two deletable sites and there occurs polymorphism in 
a specific region corresponding to amino acid residues Nos. 610-618 of the GEN-428B12c1 amino acid sequence 
(SEQ ID NO:31), whereby a mutant protein is encoded, it is conceivable that human NPIK includes species resulting 
so from a certain number of combinations, namely human NPIK, deletion-containing human NPIK, human NPIK mutant 
and/or deletion-containing human NPIK mutant. 

Recently, several proteins belonging to the family including the above-mentioned PI3 and 4 kinases have protein 
kinase activity [Dhand, R., et al., EMBO J., 13, 522-533 (1994); Stack, J. H. and Emr, S. D., J. Biol. Chera. 269, 31552- 
31562 (1994); Hartley, K. Q, et al., Cell, 82, 848-856 (1995)]. 
55 It was also revealed that a protein belonging to this family is involved in DNA repair [Hartley, K. 0., et al., Cell, 82, 
849-856 (1995)] and is a causative gene of ataxia [Savitsky, K., etal., Science, 268, 1749-1753 (1995)]. 

It can be anticipated that the human NPIK gene-encoded protein highly homologous to the family of these PI 
kinases is a novel enzyme phosphorylating lipids or proteins. 

According to this example, the novel human NPIK gene is provided. The use of said gene makes it possible to 
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detect the expression of said gene in various tissues and manufacture the human NPIK protein by the technology of 
genetic engineering and, through these, it becomes possible to study lipid- or protein-phosphrylating enzymes such as 
mentioned above, study DNA repairing, study or diagnose diseases in which these are involved, for example cancer, 
and screen out and evaluate drugs for the treatment or prevention thereof. 

5 

(4) Construction of an expression vector for fusion protein 

To subclone the coding region for a human NPIK gene (GEN-428B12c2), first of all, two primers, C1 andC2, having 
the sequences shown below in Table 10 were formed based on the information on the DNA sequences obtained above 
10 in (1). 



Table 10 



Primer 


Nucleotide sequence 


Primer C1 


5'-CTCAGATCTATGGGAGATACAGTAGTGGAGC-3' 


Primer C2 


5'-TCGAGATCTTCACATGATGCCGTTGGTGAG-3' 
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Both of the primers C1 and C2 have a Bglll site, and primer C2 is an antisense primer. 

Using these two primers, cDNA derived from human fetal brain mRNA was amplified by PCR to provide a product 
having a length of about 2500 bases. The amplified cDNA was precipitated from ethanol and inserted into pT7BlueT- 
Vector (product of Novagen) and subcloning was completed. The entire sequence was determined in the same manner 
25 as above in Examples. As a result, it was revealed that this gene had polymorphism shown above in Table 9. 

The above cDNA was cleaved by Bglll and subjected to agarose gel electrophoresis. The cDNA was then excised 
from agarose gel and collected using GENECLEAN II KIT (product of Bio 101). The cDNA was inserted into 
pB1ueBacHis2B-Vector (product of Invitrogen) at the Bgllll cleavage site and subcloning was completed. 

The fusion vector thus obtained had a Bglll cleavage site and was an expression vector for a fusion protein of the 
30 contemplated gene product (about 91 kd) and 38 amino acids derived from pBlueBacHis2B-Vector and containing a 
polyhistidine region and an epitope recognizing Anti-Xpress™ antibody (product of Invitrogen). 

(5) Transfection into insect cell Sf-9 

35 The human NPIK gene was expressed according to the Baculovirus expression system. Baculovirus is a cyclic 
double-stranded insect-pathogenic virus and can produce large amounts of inclusion bodies named polyhedrins in the 
cells of insects. Using Bac-N-Blue™ Transfection Kit utilizing this characteristic of Baculovirus and developed by Invit- 
rogen, the Baculovirus expression was carried out. 

Stated more specifically, 4 \iq of pBlueBacHis2B containing the region of the human NPIK gene and 1 \lq of Bac- 

40 N-Blue™ DNA (product of Invitrogen) were co-transfected into Sf-9 cells in the presence of Insectin™ liposomes (prod- 
uct of Invitrogen). 

Prior to co-transfection, LacZ gene was incorporated into Bac-N-Blue™ DNA, so that LacZ would be expressed 
only when homologous recombination took place between the Bac-N-Blue™ DNA and pBlueBacHis2B. Thus when the 
co-transfected Sf-9 cells were incubated on agar medium, the plaques of the virus expressing the contemplated gene 
45 were easily detected as blue plaques. 

The blue plaques were excised from each agar and suspended in 400 jutl of medium to disperse the virus thereon. 
The suspension was subjected to centrifugation to give a supernatant containing the virus. Sf-9 cells were infected with 
the virus again to increase the titre and to obtain a large amount of infective virus solution. 

so (6) Preparation of human NPIK 

The expression of the contemplated human NPIK gene was confirmed three days after infection with the virus as 
follows. 

Sf-9 cells were collected and washed with PBS. The cells were boiled with a SDS-PAGE loading buffer for 5 min- 
55 utes and SDS-PAGE was performed. According to the western blot technique using Anti-Xpress as an antibody, the 
contemplated protein was detected at the position of its presumed molecular weight. By contrast, in the case of control 
cells uninfected with the virus, no band corresponding to human NPIK was observed in the same test. 

Stated more specifically, three days after the infection of 1 5 flasks (1 75-cm 2 , FALCON) of semi-confluent Sf-9 cells, 
the cells were harvested and washed with PBS, followed by resuspension in a buffer (20 mM Tris/HCI (pH 7.5), 1 mM 
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EDTA and 1 mM DTT). The suspended cells were lysed by 4 time-sonications for 30 seconds at 4 °C with 30 seconds 
intervals. The sonicated cells were subjected to centrifugation and the supernatant was collected. The protein in the 
supernatant was immunoprecipitated using an Anti-Xpress antibody and obtained as a slurry of protein A-Sepharose 
beads. The slurry was boiled with a SDS-PAGE loading buffer for 5 minutes. SDS-PAGE was performed for identification 
5 and quantification of NPIK. The slurry itself was subjected to the following assaying. 

(7) Confirmation of PI4 Kinase activity 

NPIK was expected to have the activity of incorporation phosphoric acid at the 4-position of the inositol ring of phos- 
10 phatidylinositol (PI), namely, PI4 Kinase activity. 

PI4 Kinase activity of NPIK was assayed according to the method of Takenawa, et al. (Yamakawa, A. and Tak- 
enawa, T., J. Biol. Chem., 263, 17555-17560 (1988)) as shown below. 

First prepared was a mixture of 10 \i\ of a NPIK slurry (20 mM Tris/HCI (pH 7.5), 1 mM EDTA, 1 mM DTT and 50% 
protein A beads), 10 mJ of a PI solution (prepared by drying 5 mg of a Pl-containing commercial chloroform solution in 
75 a stream of nitrogen onto a glass tube wall, adding 1 ml of 20 mM Tris/HCI (pH 7.5) buffer and forming micelles by son- 
ication), 10 \l\ of an applied buffer (210 mM Tris/HCI (pH 7.5), 5 mM EGTA and 100 mM MgCI 2 ) and 10 jil of distilled 
water. Thereto was added 1 0 fxl of an ATP solution (5 \i\ of 500 jiM ATP, 4.9 \i\ of distilled water and 0.1 p.l of y- 32 P ATP 
(6000 Ci/mmol, product of NEN Co., Ltd.)). The reaction was started at 30°C and continued for 2, 5, 10 and 20 minutes. 
The time 10 minutes was set as incubation time because a straight-line increase was observed around 10 minutes in 
20 incorporation of phosphoric acid into PI in the assaying process described below. 

After completion of the reaction, PI was fractionated by the solvent extraction method and finally re-suspended in 
chloroform. The suspension was developed by thin layer chromatography (TLC) and the radioactivity of the reaction 
product at the PI4P-position was assayed using an analyzer (trade name: Bio-Image; product of Fuji Photo Film Co., 
Ltd.). 

25 Fig. 1 shows the results. Fig. 1 is an analytical diagram of the results of assaying the radioactivity based on TLC as 
mentioned above. The right lane (2) is the fraction of Sf-9 cell cytoplasm infected with the NPIK-containing virus, 
whereas the left lane (1) is the fraction of uninfected Sf-9 cell cytoplasm. 

Also, predetermined amounts of Triton X-100 and adenosine were added to the above reaction system to check 
how such addition would affect the PI4 Kinase activity. The PI4 Kinase activity was assayed in the same manner as 

30 above. 

Fig. 2 shows the results. The results confirmed that NPIK had a typical PI4 Kinaze activity accelarated by Triton X- 
100 and inhibited by adenosine. 

Example 1Q 

35 

nel-related protein type 1 (NRP1) gene and nel-related protein type 2 (NRP2) gene 

(1) Cloning and DNA sequencing of NRP1 gene and NRP2 gene 

40 EGF-like repeats have been found in many membrane proteins and in proteins related to growth regulation and dif- 
ferentiation. This motif seems to be involved in protein-protein interactions. 

Recently, a gene encoding nel, a novel peptide containing five EGF-like repeats, was cloned from a chick embry- 
onic cDNA library [Matsuhashi, S., et al., Dev. Dynamics. 203, 212-222 (1995)]. This product is considered to be a 
transmembrane molecule with its EGF-like repeats in the extracellular domain. A 4.5 kb transcript (nel mRNA) is 
45 expressed in various tissues at the embryonic stage and exclusively in brain and retina after hatching. 

Following the procedure of Example 1 (1), cDNA clones were randomly selected from a human fetal brain cDNA 
library and subjected to sequence analysis, followed by database searching. As a result, two cDNA clones with signifi- 
cantly high homology to the above-mentioned nel were found and named GEN-073E07 and GEN-093E05, respectively. 
Since both clones were lacking in the 5' portion, 5 RACE was performed in the same manner as in Example 2 (2) 
so to obtain the entire coding regions. 

As for the primers for 5' RACE, primers having an arbitrary sequence obtained from the cDNA sequences of the 
above clones were synthesized while the anchor primer attached to a commercial kit was used as such. 

5* RACE clones obtained from the PCR were sequenced and the sequences seemingly covering the entire coding 
regions of both genes were obtained. These genes were respectively named nel-related protein type 1 (NRP1) gene 
55 and nel-related protein type 2 (NRP2) gene. 

The NRP1 gene contains an open reading frame of 2,430 nucleotides, as shown under SEQ ID NO:35, the amino 
acid sequence deduced therefrom comprises 81 0 amino acid residues, as shown under SEQ ID NO:34, and the nucle- 
otide sequence of the entire cDNA clone of said NRP1 gene comprises 2,977 nucleotides, as shown under SEQ ID 
NO:36. 
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On the other hand, the NRP2 gene contains an open reading frame of 2,448 nucleotides, as shown under SEQ ID 
NO:38, the amino acid sequence deduced therefrom comprises 816 amino acid residues, as shown under SEQ ID 
NO:37, and the nucleotide sequence of the entire cDNA clone of said NRP2 gene comprises 3,198 nucleotides, as 
shown under SEQ ID NO:39. 

5 Furthermore, the coding regions were amplified by RT-PCR to exclude the possibility that either of the sequences 
obtained was a chimeric cDNA. 

The deduced NRP1 and NRP2 gene products both showed highly hydrophobic N termini capable of functioning as 
signal peptides for membrane insertion. As compared with chick embryonic nel, they both appeared to have no hydro- 
phobic transmembrane domain. Comparison among NRP1, NRP2 and nel with respect to the deduced peptide 

10 sequences revealed that NRP2 has 80% homology on the amino acid level and is more closely related to nel than 
NRP1 having 50% homology. The cysteine residues in cysteine-rich domains and EGF-like repeats were found com- 
pletely conserved. 

The most remarkable difference between the NRPs and the chick protein was that the human homologs lack the 
putative transmembrane domain of nel. However, even in this lacking region, the nucleotide sequences of NRPs were 
is very similar to that of nel. Furthermore, the two NRPs each possessed six EGF-like repeats, whereas nel has only five. 
Other unique motifs of nel as reported by Matsuhashi et al. [Matsuhashi, S., et a!., Dev. Dynamics, 203. 212-222 
(1995)] were also found in the NRPs at equivalent positions. Since as mentioned above, it was shown that the two 
deduced NRP peptides are not transmembrane proteins, the NRPs might be secretory proteins or proteins anchored to 
membranes as a result of posttranslational modification. 
20 The present inventors speculate that NRPs might function as ligands by stimulating other molecules such as EGF 
receptors. TTie present inventors further found that an extra EGF-like repeat could be encoded in nel upon frame shifting 
of the membrane domain region of nel. 

When paralleled and compared with NRP2 and nel, the frame-shifted amino acid sequence showed similarities 
over the whole range of NRP2 and of nel, suggesting that NRP2 might be a human counterpart of nel. In contrast, 
25 NRP1 is considered to be not a human counterpart of nel but a homologous gene. 

(2) Northern blot analysis 

Northern blot analysis was carried out as described in Example 1 (2). Thus, the entire sequences of both clones 
30 cDNAs were amplified by PCR, the PCR products were purified and labeled with [ 32 P]-dCTP (random-primed DNA 
labeling kit, Boehringer Mannneim) and human normal tissues were examined for NRP mRNA expression using an 
MTN blot with the labeled products as two probes. 

Sixteen adult tissues and four human fetal tissues were examined for the expression pattern of two NRPs. 

As a result of the Northern blot analysis, it was found that a 3.5 kb transcript of NRP1 was weakly expressed in fetal 
35 and adult brain and kidney. A 3.6 kb transcript of NRP2 was strongly expressed in adult and fetal brain alone, with weak 
expression thereof in fetal kidney as well. 

This suggests that NRPs might play a brain-specific role, for example as signal molecules for growth regulation. In 
addition, these genes might have a particular function in kidney. 

40 (3) Chromosomal mapping of NRP1 gene and NRP2 gene by FISH 

Chromosomal mapping of the NRP1 gene and NRP2 gene was performed by FISH as described in Example 1 (3). 
As a result, it was revealed that the chromosomal locus of the NRP1 gene is localized to 11p15.1-p15.2 and the 
chromosomal locus of the NRP2 gene to 12q13.11-q13.12. 

45 According to the present invention, the novel human NRP1 gene and NRP2 gene are provided and the use of said 
genes makes it possible to detect the expression of said genes in various tissues and produce the human NRP1 and 
NRP2 proteins by the technology of genetic engineering. They can further be used in the study of the brain neurotrans- 
mission system, diagnosis of various diseases related to neurotransmission in the brain, and the screening and evalu- 
ation of drugs for the treatment and prevention of such diseases. Furthermore, the possibility is suggested that these 

so EGF domain-containing NRPs act as growth factors in brain, hence they may be useful in the diagnosis and treatment 
of various kinds of intracerebral tumor and effective in nerve regeneration in cases of degenerative nervous diseases. 

Example 11 

55 GSPT1 -related protein (GSPT1 -TK) gene 

(1) GSPT1-TK gene cloning and DNA sequencing 

The human GSPT1 gene is one of the human homologous genes of the yeast GST1 gene that encodes the GTP- 



25 



EP0 796 913 A2 



binding protein essential for the G1 to S phase transition in the cell cycle. The yeast GST1 gene, first identified as a 
protein capable of complementing a temperature-sensitive gst1 (G1-to-S transition) mutant of Saccharomyces cerevi- 
siae. was isolated from a yeast genomic library [Kikuchi & Y, Shimatake, H. and Kikuchi, A., EMBO J., 7, 1 175-1182 
(1988)] and encoded a protein with a target site of cAMP-dependent protein kinases and a GTPase domain. 

5 The human GSPT1 gene was isolated from a KB cell cDNA library by hybridization using the yeast GST1 gene as 
a probe [Hoshino, S., Miyazawa, H., Enomoto, T., Hanaoka, R, Kikuchi, Y, Kikuchi, A. and Ui, M., EMBO J., 8, 3807- 
3814 (1989)]. The deduced protein of said GSPT1 gene, like yeast GST1 , has a GTP-binding domain and a GTPase 
activity center, and plays an important role in cell proliferation. 

Furthermore, a breakpoint for chromosome re-arrangement has been observed in the GSPT1 gene located in the 

10 chromosomal locus 16p13.3 in patients with acute nonlymphocytic leukemia (ANLL) [Ozawa, K., Murakami, Y, Eki, T, 
Yokoyama, K. Soeda, E., Hoshino, S. Ui, M. and Hanaoka, R, Somatic Cell and Molecular Genet., IS, 189-194 (1992)]. 

cDNA clones were randomly selected from a human fetal brain cDNA library and subjected to sequence analysis 
as described in Example 1 (1) and database searching was performed and, as a result, a clone having a 0.3 kb cDNA 
sequence highly homologous to the above-mentioned GSPT1 gene was found and named GEN-077A09. The GEN- 

75 077A09 clone seemed to be lacking in the 5' region, so that 5' RACE was carried out in the same manner as in Example 
2 (2) to obtain the entire coding region. 

The primers used for the 5' RACE were P1 and P2 primers respectively having the nucleotide sequences shown in 
Table 1 1 as designed based on the known cDNA sequence of the above-mentioned cDNA, and the anchor primer used 
was the one attached to the commercial kit Thirtyfive cycles of PCR were performed under the following conditions: 

20 94°C for 45 seconds, 58°C for 45 seconds and 72°C for 2 minutes. Finally, elongation reaction was carried out at 72°C 
for 7 minutes. 



Table 11 



Primer 


Nucleotide sequence 


P1 primer 


5'-GATTTGTGCTCAATAATCACTATCTGAA-3' 


P2 primer 


5 , -GGTTACTAGGATCACAAAGTATGAATTCTGGAA-3 , 



30 

Several of the 5' RACE clones obtained from the above PCR were sequenced and the base sequence of that cDNA 
clone showing overlapping between the 5' RACE clones and the GEN-077A09 clone was determined to thereby reveal 
the sequence regarded as covering the entire coding region. This was named GSPT1 -related protein "GSPT1-TK 
35 gene". 

The GSPT1-TK gene was found to contain an open reading frame of 1 ,497 nucleotides, as shown under SEQ ID 
NO:41 . The amino acid sequence deduced therefrom contained 499 amino acid residues, as shown under SEQ ID 
NO:40. 

The nucleotide sequence of the whole cDNA clone of the GSPT1-TK gene was found to comprise 2,057 nude- 
40 otides, as shown under SEQ ID NO:42, and the molecular weight was calculated at 55,740 daltons. 

The first methionine code (ATG) in the open reading frame had no in-frame termination codon but this ATG was sur- 
rounded by a sequence similar to the Kozak consensus sequence for translationa! initiation. Therefore, it was con- 
cluded that this ATG triplet occurring in positions 144-146 of the relevant sequence is the initiation codon. 

Furthermore, a polyadenylation signal, AATAAA, was observed 13 nucleotides upstream from the polyadenylation 

45 site. 

Human GSPT1-TK contains a glutamic acid rich region near the N terminus, and 18 of 20 glutamic acid residues 
occurring in this region of human GSPT1 -TK are conserved and align perfectly with those of the human GSPT1 protein. 
Several regions (G1 , G2, G3, G4 and G5) of GTP-binding proteins that are responsible for guanine nucleotide binding 
and hydrolysis were found conserved in the GSPT1-TK protein just as in the human GSPT1 protein. 
50 Thus, the DNA sequence of human GSPT1 -TK was found 89.4% identical, and the amino acid sequence deduced 
therefrom 92.4% identical, with the corresponding sequence of human GSPT1 which supposedly plays an important 
role in the G1 to S phase transition in the cell cycle. Said amino acid sequence showed 50.8% identity with that of yeast 
GST1. 

55 (2) Northern blot analysis 

Northern blot analysis was carried out as described in Example 1 (2). Thus, the GEN-077A09 cDNA clone was 
amplified by PCR, the PCR product was purified and labeled with [ 32 P]-dCTP (random-primed DNA labeling kit, Boe- 
hringer Mannheim), and normal human tissues were examined for the expression of GSPT1-TK mRNA therein using 
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an MTN blot with the labeled product as a probe. 

As a result of the Northern blot analysis, a 2.7 kb major transcript was detected in various tissues. The level of 
human GSPT1 -TK expression seemed highest in brain and in testis. 

(3) Chromosome mapping of GSPT1-TK gene by FISH 

Chromosome mapping of the GSPT1-TK gene was performed by FISH as described in Example 1 (3). 

As a result, it was found that the GSPT1-TK gene is localized at the chromosomal locus 19p13.3. In this chromo- 
somal localization site, reciprocal location has been observed very frequently in cases of acute lymphocytic leukemia 
(ALL) and acute myeloid leukemia (AML). In addition, it is reported that acute non-lymphocytic leukemia (ANLL) is 
associated with re-arrangements involving the human GSPT1 region [Ozawa, K., Murakami, Y, Eki, T, Yokoyama, K., 
Soeda, E., Hoshino, S., Ui, M. and Hanaoka, F, Somatic Cell and Molecular Genet, 18, 189-194 (1992)]. 

In view of the above, it is suggested that this gene is the best candidate gene associated with ALL and AML. 

In accordance with the present invention, the novel human GSPT1-TK gene is provided and the use of said gene 
makes it possible to detect the expression of said gene in various tissues and produce the human GSPT1-TK protein 
by the technology of genetic engineering. These can be used in the studies of cell proliferation, as mentioned above, 
and further make it possible to diagnose various diseases associated with the chromosomal locus of this gene, for 
example acute myelocytic leukemia. This is because translocation of this gene may result in decomposition of the 
GSPT1 -TK gene and further some or other fused protein expressed upon said translocation may cause such diseases. 

Furthermore, it is expected that diagnosis and treatment of said diseases can be made possible by producing anti- 
bodies to such fused protein, revealing the intracellular localization of said protein and examining its expression specific 
to said diseases. Therefore, it is also expected that the use of the gene of the present invention makes it possible to 
screen out and evaluate drugs for the treatment and prevention of said diseases. 
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SEQUENCE LISTING 



(2) INFORMATION FOR SEQ ID NO:l: 

(i) SEQUENCE CHARACTERISTICS: 

(A) ' LENGTH: 122 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:l: 



Met Glu Leu Gly Glu Asp Gly Ser Val Tyr Lys Ser lie Leu Val Thr 
15 10 15 

Ser Gin Asp Lys Ala Pro Ser Val lie Ser Arg Val Leu Lys Lys Asn 
20 25 " 30 

Asn Arg Asp Ser Ala Val Ala Ser Glu Tyr Glu Leu Val Gin Leu Leu 
~ 35 40 45 

Pro Gly Glu Arg Glu Leu Thr lie Pro Ala Ser Ala Asn Val Phe Tyr 
50 55 60 

Pro Met Asp Gly Ala Ser His Asp Phe Leu Leu Arg Gin Arg Arg Arg 
65 70 75 80 

Ser Ser Thr Ala Thr Pro Gly Val Thr Ser Gly Pro Ser Ala Ser Gly 
85 90 95 

Thr Pro Pro Ser Glu Gly Gly Gly Gly Ser Phe Pro Arg lie Lys Ala 
100 105 110 

Thr Gly Arg Lys lie Ala Arg Ala Leu Phe 
115 120 



(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 366 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: DNA(cDNA) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 



ATGGAGTTGG GGGAAGATGG CAGTGTCTAT AAGAGCATTT TGGTGACAAG CCAGGACAAG 
GCTCCAAGTG TCATCAGTCG TGTOCTTAAG AAAAACAATC GTGACTCTGC AGTGGCTTCA 
GAGTATGAGC TGGTACAGCT GCTACCAGGG GAGOGAGAGC TGACTATOCC AGCCTCGGCT 
AATGTATTCT ACCCCATGGA TQC3AGCTTCA CACGATTTOC TCCTGCGGCA GOGGCGAAGG 
TCCTCTACTG CTACACCTGG CGTCACCAGT GGCCCGTCTG CCTCAGGAAC TCCTCCGAGT 
GAGQGAGGAG GGGGCTCCTT TCCCAGGATC AAGGCCACAG GGAGGAAGAT TGCACGGGCA 
CTCTTC 



(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 842 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA( genomic) 
(lii) HYPOTHETICAL: NO 
(iv) ANTI -SENSE: NO 

(vii) IMMEDIATE SOURCE: 

(A) LIBRARY: Human fetal brain cDNA library 

(B) CLONE: GEN-501D08 

( ix ) FEATURE : 

(A) NAME/KEY: CDS 

(B) LOCATION: 28.. 393 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 



CCCACGAGCC GTATCATCCG AGTCCAG ATG GAG TTG GQG GAA GAT GGC AGT 

Met Glu Leu Gly Glu Asp Gly Ser 
1 5 

GTC TAT AAG AGC ATT TTG GTG ACA AGC CAG GAC AAG GCT CCA AGT GTC 
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Val Tyr Lys Ser lie Leu Val Thr Ser Gin Asp Lys Ala Pro Ser Val 
10 15 20 

ATC ACT OGT GTC CTT AAG AAA AAC AAT OCT GAC TCT GCA CTG OCT TCA 
lie Ser Arg Val Leu Lys Lys Asn Asn Arg Asp Ser Ala Val Ala Ser 
25 30 35 40 

GAG TAT GAG CTG CTA CAG CTG CTA OCA GGG GAG CGA GAG CTG ACT ATC 
Glu Tyr Glu Leu Val Gin Leu Leu Pro Gly Glu Arg Glu Leu Thr lie 
45 50 55 

OCA GOC TOG GCT AAT CTA TTC TAC OOC ATG GAT GGA GCT TCA CAC GAT 
Pro Ala Ser Ala Asn Val Phe Tyr Pro Met Asp Gly Ala Ser His Asp 
60 65 70 

TTC CTC CTG COG CAG OGG OGA AGG TOC TCT ACT GCT ACA OCT GGC GTC 
Hie Leu Leu Arg Gin Arg Arg Arg Ser Ser Thr Ala Thr Pro Gly Val 
75 80 85 

ADC ACT GGC COG TCT GOC TCA GGA ACT OCT COG ACT GAG GGA GGA GGG 
Thr Ser Gly Pro Ser Ala Ser Gly Thr Pro Pro Ser Glu Gly Gly Gly 
90 95 100 

GGC TOC TTT OCC AGG ATC AAG GCC ACA GGG AGG AAG ATT GCA OGG GGA 
Gly Ser Phe Pro Arg lie Lys Ala Thr Gly Arg Lys lie Ala Arg Ala 
105 110 115 120 

CTG TTC TGAGGAGGAA GCX3XTTTTT TTACAGAACT CATGGTGTTC ATAOCAGATG 
Leu Phe 



TGGGTAG0CA 


TOCTGAATGG 


TGGCAATTAT 


ATCACATTGA GACAGAAATT CAGAAAGGGA 


GOCAGOCAOC 


CTGGGGCAGT 


GAAGTGOCAC 


TQGTTTACCA GACAGCTGAG AAATOCAGOC 


CTGTOGGAAC 


TGGTGTCTTA 


TAACCAAGTT 


GGATAOCTGT CTATAGCTTG 0CA0CTT0CA 


TGAGTGCAGC 


ACACAGCTAG 


TGCTGGAAAA 


AOGCATCACT TTCTGATTCT TGGOCATATC 


CTAACATQCA 


AGGGOCAAGC 


AAAGGCTTCA 


AGGCTCTGAG OOOCAGQGCA GAGGGGAATG 


GCAAAATGTA 


GGTCCTQGCA 


GGAGCTCTTC 


TTOOCACTCT GGGGCTTTCT ATCACTGTGA 


CAACACTAAG 


ATAATAAAOC 


AAAACACTAC 


CTGAATOCT 



(2) INFORMATICS FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 193 amino acids 

(B) TYPE: amino acid 
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(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: protein 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:4: 



Met Glu Leu Glu Leu Tyr Gly Val Asp Asp Lys Phe Tyr Ser Lys Leu 
15 10 15 

Asp Gin Glu Asp Ala Leu Leu Gly Ser Tyr Pro Val Asp Asp Gly Cys 
20 25 30 

Arg lie His Val He Asp His Ser Gly Ala Arg Leu Gly Glu Tyr Glu 
35 40 45 

Asp Val Ser Arg Val Glu Lys Tyr Thr He Ser Gin Glu Ala Tyr Asp 
50 55 60 

Gin Arg Gin Asp Thr Val Arg Ser Phe Leu Lys Arg Ser Lys Leu Gly 
65 70 75 80 

Arg Tyr Asn Glu Glu Glu Arg Ala Gin Gin Glu Ala Glu Ala Ala Gin 
85 90 95 

Arg Leu Ala Glu Glu Lys Ala Gin Ala Ser Ser He Pro Val Gly Ser 
100 105 110 

Arg Cys Glu Val Arg Ala Ala Gly Gin Ser Pro Arg Arg Gly Thr Val 
115 120 125 

Met Tyr Val Gly Leu Thr Asp Phe Lys Pro Gly Tyr Trp He Gly Val 
130 135 140 

Arg Tyr Asp Glu Pro Leu Gly Lys Asn Asp Gly Ser Val Asn Gly Lys 
145 150 155 160 

Arg Tyr Phe Glu Cys Gin Ala Lys Tyr Gly Ala Phe Val Lys Pro Ala 
165 170 175 

Val Val Thr Val Gly Asp Phe Pro Glu Glu Asp Tyr Gly Leu Asp Glu 
180 185 190 

He 



(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 579 base pairs 
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(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA(cDNA) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 

ATGGAACTGG AGCTGTATGG AGTTCAOGAC AAGTTCTACA GCAAGCTGGA TCAAGAGGAT ' 
GOQCTCCTGG GCTOCTADOC TGTAGATGAC GGCTGOOGCA TOCACGTCAT TGAOCACAGT 
GGOGOOOGOC TTGGTGAGTA TGAGGAOGIG TOOCGQCTGG AGAACTACAC GATCTCACAA 
GAAGOCTAGG AOCAGAGGCA AGACACQGTC CGCTCTTTOC TGAAGOGCAG CAAGCTOGGC 
0GCTACAAO5 AGGAGGAGOG GGCTCAGCAG GAGGOOGAGG OCGOOCAGOG CCTGGOOGAG 
GAGAAGGOOC AGGOCAGCTC CATOOOOGTG GGCAGOOGCT GTGAGGTGOG GGOGGOGGGA 
CAATOOOCTC GOOGGGGCAC CGTCATGTAT GTAGGICTCA CAGATTTCAA GOCTGGCTAC 
TGGATTOOTG TOOGCTATGA TGAGOCACTG GGGAAAAATC ATGGCAGTCT GAATGGGAAA 
CGCTACTTOG AATGOCAGGC CAAGTATGGC GOC^TTGTCA AGOCAGCAGT OGTGAOGCTG 
GGGGACTTOC OGGAGGAGGA CTAGGGGTTC GAGGAGATA 

(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1015 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA( genomic) 
(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

(vii) IMMEDIATE SOURCE: 

(A) LIBRARY: Human fetal brain cDNA library 

(B) CLONE: GEN-080G01 

(ix) FEATURE: 

(A) NAME /KEY: CDS 
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10 



(B) LOCATION: 274.. 852 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 

TGATTQCTCA GGCAOGGAGC AGGAGQOGGG CTGATAGCCC AGCAGCAGCA GOGGCGGCGG 60 

CQGCTOOGGA GOQQCTCTGA GGCGGCTGGA CTX30GCTGCA GGCATOOGCG GGOGOGGGAA 120 

GATGGAGGTG AGGGGGGTGT COGCACCACG GTCACXX3TTT TCATCAGCAG CTOOCTCAGC 180 

AOCTTOOQCT COGAGAAGOG ATACAGOOGC AGOCTCAOCA TCGCTGAGTT CAACTGTAAA 240 

« CTQGAGITGC TGGTGGGCAG OCCTGCTTOC TGC ATG GAA CTG GAG CTG TAT GGA 294 

Met Glu Leu Glu Leu Tyr Gly 
1 5 

CTT GAC GAC AAG TTC TAC AGC AAG CTG GAT CAA GAG GAT GOG CIC CTG 342 
20 Val Asp Asp Lys Phe Tyr Ser Lys Leu Asp Gin Glu Asp Ala Leu Leu 
10 15 20 

GGC TOC TAC OCT GTA GAT GAC GGC TGC CGC ATC CAC GTC ATT GAC CAC 390 
Gly Ser Tyr Pro Val Asp Asp Gly Cys Arg lie His Val lie Asp His 
25 30 35 



25 



30 



35 



40 



45 



50 



55 



ACT, GGC GCC CGC CTT GGT GAG TAT GAG GAC CTG TOC CGG GTG GAG AAG 438 
Ser Gly Ala Arg Leu Gly Glu Tyr Glu Asp Val Ser Arg Val Glu Lys 
40 45 50 55 

TAC AOG ATC TCA CAA GAA GCC TAC GAC CAG AGG CAA GAC AOG GTC CGC 486 
Tyr Thr lie Ser Gin Glu Ala Tyr Asp Gin Arg Gin Asp Thr Val Arg 
60 65 70 

TCT TTC CTG AAG CGC AGC AAG CTC GGC CGG TAC AAC GAG GAG GAG CGG 534 
Ser Phe Leu Lys Arg Ser Lys Leu Gly Arg Tyr Asn Glu Glu Glu Arg 
75 80 85 

GCT CAG CAG GAG GCC GAG GCC GCC CAG CGC CTG GGC GAG GAG AAG GCC 582 
Ala Gin Gin Glu Ala Glu Ala Ala Gin Arg Leu Ala Glu Glu Lys Ala 
90 95 100 

CAG GCC AGC TOC ATC CCC GTG GGC AGC GGC TGT GAG GTG CGG GOG GOG 630 
Gin Ala Ser Ser lie Pro Val Gly Ser Arg Cys Glu Val Arg Ala Ala 
105 110 ~ 115 

GGA CAA TOC OCT CGC CGG GGC AOC GTC ATG TAT GTA GGT CTC ACA GAT 678 
Gly Gin Ser Pro Arg Arg Gly Thr Val Met Tyr Val Gly Leu Thr Asp 
120 125 130 135 

TTC AAG OCT GGC TAC TGG ATT GGT GTC OGC TAT GAT GAG CCA CTG GGG 726 
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Phe Lys Pro Gly Tyr Trp lie Gly Val Arg Tyr Asp Glu Pro Leu Gly 
140 145 150 

AAA AAT GAT GGC AGT GTG AAT GGG AAA CGC TAG TTC GAA TGC CAG GOC 
Lys Asn Asp Gly Ser Val Asn Gly Lys Arg Tyr Phe Glu Cys Gin Ala 
155 160 165 

AAG TAT GOC GOC TTT GTC AAG OCA GCA GTC GTG AOG GTG GGG GAC TTC 
Lys Tyr Gly Ala Phe Val Lys Pro Ala Val Val Thr Val Gly Asp Phe 
170 ■ 175 180 

COG GAG GAG GAC TAC GGG TTG GAC GAG ATA TGACAOCTAA GGAATTOOOC 
Pro Glu Glu Asp Tyr Gly Leu Asp Glu lie 
185 190 

TGCTTCAGCT CCTAGCTCAG OCACTGACTG OOOCTOCTGT GTGTGOCCAT GGOOCTTTTC 

T0CTGAO00C ATTTTAATTT TATTCATTTT TTOCTTTGOC ATTGATTTTT GAGACTCATG 

CATTAAAITC ACTAGAAAOC CAG 



(2) INFORMATION FOR SEQ ID NO: 7: 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 128 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 



Met Thr Glu Ala Asp Val Asn Pro Lys Ala Tyr Pro Leu Ala Asp Ala 
1 5 10 15 

His Leu Thr Lys Lys Lai Leu Asp Leu Val Gin Gin Ser Cys Asn Tyr 
20 25 30 

Lys Gin Leu Arg Lys Gly Ala Asn Glu Ala Thr Lys Thr Leu Asn Arg 
35 40 45 

Gly lie Ser Glu Phe lie Val Met Ala Ala Asp Ala Glu Pro Leu Glu 
50 55 60 

lie lie Leu His Leu Pro Leu Leu Cys Glu Asp Lys Asn Val Pro Tyr 
65 70 75 80 

Val Phe Val Arg Ser Lys Gin Ala Leu Gly Arg Ala Cys Gly Val Ser 
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10 



20 



25 



30 



85 90 95 

Arg Pro Val He Ala Cys Ser Val Thr He Lys Glu Gly Ser Gin Leu 
100 105 110 

Lys Gin Gin He Gin Ser. He Gin Gin Ser He Glu Arg Leu Leu Val 
115 120 125 



(2) INFORMATION FOR SEQ ID NO: 8: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 384 base pairs 
is (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA( genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 



ATGACTGAGG CTGATGTGAA T0CAAAGG0C TAT000CTTG CCGATOOOCA OCTCADCAAG 60 

AAGCTACTOG ACCTOGTTCA GCAGTCATGT AACTATAAGC AGCTTOQGAA AGGAGOCAAT 120 

GAGG0CACCA AAAOGCTCAA CAGGGGCATC TCTGACTTCA TOGTGATGGC TGCAGACGOC 180 

GAGCCACTGG AGATCATTCT GCADCTG00G CTQCTGTCTG AAGACAAGAA TCTGOOCTAC 240 

GTOnTGTGC GCTOCAAGCA GGOOCTGQQG AGAGOCTGTG GGCTCTOCAG GCCTGTCATC 300 

GOCTGITCTG TCACCATCAA AGAAGGCTOG CAGCTGAAAC AGCAGATOCA ATOCATTCAG 360 

35 CAGTOCATTG AAAGGCTCTT AGTC 384 

(2) INFORMATION FOR SEQ ID NO: 9: 

40 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1493 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

45 

(ii) MOLECULE TYPE: DNA( genomic) 
(iii) HYPOTHETICAL: NO 
so (iv) ANTI-SENSE: NO 
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(vii) IMMEDIATE SOURCE: 

(A) LIBRARY: Human fetal brain cDNA library 

(B) CLONE: GEN-025F07 

(ix) FEATURE: 

(A) NAME /KEY: CDS 

(B) LOCATION: 95*. 478 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 



ATOOGTGTCC TTGOGGTGCT GGGCAGCAGA COGTOCAAAC CGACAOGCGT GGTATOCTOG 

OGGTCTOOGG CAAGAGACTA CCAAGACAGA CGCT ATG ACT GAG GCT GAT GTG 

Met Thr Glu Ala Asp Val 
1 5 

AAT CCA AAG GCC TAT COC CTT GOC GAT GOC CAC CTC ACC AAG AAG CTA 
Asn Pro Lys Ala Tyr Pro Leu Ala Asp Ala His Leu Thr Lys Lys Leu 
10 15 20 

CTG GAC CTC CTT CAG CAG TCA TOT AAC TAT AAG CAG CTT CGG AAA GGA 
Leu Asp Leu Val Gin Gin Ser Cys Asn Tyr Lys Gin Leu Aig Lys Gly 
25 30 35 

GOC AAT GAG GCC ACC AAA ACC CTC AAC AGG GOC ATC TCT GAG TTC ATC 
Ala Asn Glu Ala Thr Lys Thr Leu Asn Arg Gly lie Ser Glu Hie lie 
40 45 50 

GTG ATG GCT GCA GAC GOC GAG CCA CTG GAG ATC ATT CTG CAC CTG OOG 
Val Met Ala Ala Asp Ala Glu Pro Leu Glu lie He Leu His Lai Pro 
55 60 65 70 

CTG CTG TGT GAA GAC AAG AAT GTG COC TAC GTG TTT GIG GOC TOC AAG 
Leu Leu Cys Glu Asp Lys Asn Val Pro Tyr Val Phe Val Arg Ser Lys 
75 80 85 

CAG GOC CTG GGG AGA GOC TGT GGG CTC TCC AGG OCT GTC ATC GCC TGT 
Gin Ala Leu Gly Arg Ala Cys Gly Val Ser Arg Pro Val He Ala Cys 
90 95 100 

TCT GTC ACC ATC AAA GAA GOC TOG CAG CTG AAA CAG CAG ATC CAA TOC 
Ser Val Thr He Lys Glu Gly Ser Gin Leu Lys Gin Gin He Gin Ser 
105 110 115 

ATT CAG CAG TOC ATT GAA AGG CTC TTA GTC TAAACCTGPG GCCTCTGOCA 
He Gin Gin Ser He Glu Arg Leu Leu Val 
120 125 

CGTGCTOOCT GCX^GCTTOC CCOCTGAGGT TGTGTATCAT ATTATCTGTG TTAGCATGTA 
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CTATTTTCAG CTACTCTCTA TTGTTATAAA ATGTACTACT AAATCTGGTT TCTGGATTTT 618 

TOK7TTGTTT TTGTTCTGTT TTACAGGCTT GCTATOOOOC TTOCTTTCCT COCTOCCTCT 678 

5 

GOCATOCTTC ATOCTTTTAT CCTCOCTTTT TGGAACAAGT GTTCAGAGCA GACAGAAGCA 738 

GGCTOGTOQC AOOGITGAAA GQCAGAAAGA GOCAGGAGAA AGCTGATQGA GCCAGGACAG 798 

70 AGATCTGGTT CCAGCTTTCA OXACTAGCT TCCTGTTCIG TGCGGGGFGT GCTGGAATTA 858 

AACAGCATTC ATTCTCTCTC CCTGTGCCTG GCACACAGAA TCATTCATAC GTGTTCAAGT 918 

GATCAAQQGG TTTCATTTGC TCTTGQGGGA TTAGCTATCA TTTGGGGAGG AAGCATGTCT 978 

15 

TCTGTGAGGT TGTTOGGCTA TGTCCAAGTG TOGTTTACTA ATCTAGOCCT GCTGTTTGCT 1038 

TTTGGTAATG TGATGTTGAT GTTCTOOOOC TAOOCACAAC CATGOOCTTG AGGGTAGCAG 1098 

20 GGCAQCAGCA TACCAAAGAG ATGIGCTGCA GGACTOCGGA GGCAGCCTGG GTGGCTGAGC 1158 

CATQQGGCAG TTGAOCTOGG TCTTGAAAGA GTOGGGAGTG ACAAGCTCAG AGAGCATGAA 1218 

CTGATGCTGG CATGAAGGAT TOCAGGAAGA TCATGGAGAC CTGGCTGGTA GCTGTAACAG 1278 

25 

AGATGGTOGA CTOCAAGGAA ACAGOCTGTC TCTGGTGAAT GGGACTTTCT TTGGTGGACA 1338 

CITOGCACCA GCTCTGAGAG OOCTTOOOCT GTOTOCTGOC AOCATGTGGG TCAGATGTAC 1398 

30 TCTCTGTCAC ATGAGGAGAG TGCTAGTTCA TGTGTTCTOC ATTCTTGTGA GCATOCTAAT 1458 

AAATCTGTTC CATTTTGAAA AAAAAAAAAA AAAAA 1493 

35 (2) INFORMATION PGR SEQ ID N0:10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 711 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

40 

(ii) MOLECULE TYPE: pr otein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 

45 

Met Pro Ala Asp Val Asn Leu Ser Gin Lys Pro Gin Val Leu Gly Pro 
15 10 15 

Glu Lys Gin Asp Gly Ser Cys Glu Ala Ser Val Ser Phe Glu Asp Val 

so 20 25 30 
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Thr Val Asp Phe Ser Arg Glu Glu Trp Gin Gin Leu Asp Pro Ala Gin 
35 40 45 

Arg Cys Leu Tyr Arg Asp Val Met Leu Glu Leu Tyr Ser His Leu Phe 
50 55 60 

Ala Val Gly Tyr His He Pro Asn Pro Glu Val He Phe Arg Met Leu 
65 70 75 " 80 

Lys Glu Lys Glu Pro Arg Val Glu Glu Ala Glu Val Ser His Gin Arg 
85 90 95 

Cys Gin Glu Arg Glu Phe Gly Leu Glu He Pro Gin Lys Glu He Ser 
100 105 110 

Lys Lys Ala Ser Phe Gin Lys Asp Met Val Gly Glu Phe Thr Arg Asp 
115 120 125 

Gly Ser Trp Cys Ser He Leu Glu Glu Leu Arg Leu Asp Ala Asp Arg 
130 135 140 

Thr Lys Lys Asp Glu Gin Asn Gin He Gin Pro Met Ser His Ser Ala 
145 150 155 160 

Phe Phe Asn Lys Lys Thr Leu Asn Thr Glu Ser Asn Cys Glu Tyr Lys 
165 170 175 

Asp Pro Gly Lys Met lie Arg Thr Arg Pro His Leu Ala Ser Ser Gin 
180 185 190 

Lys Gin Pro Gin Lys Cys Cys Leu Phe Thr Glu Ser Leu Lys Leu Asn 
195 200 205 

Leu Glu Val Asn Gly Gin Asn Glu Ser Asn Asp Thr Glu Gin Leu Asp 
210 215 220 

Asp Val Val Gly Ser Gly Gin Leu Phe Ser His Ser Ser Ser Asp Ala 
225 230 235 240 

Cys Ser Lys Asn He His Thr Gly Glu Thr Phe Cys Lys Gly Asn Gin 
245 250 255 

Cys Arg Lys Val Cys Gly His Lys Gin Ser Leu Lys Gin His Gin He 
260 265 270 

His Thr Gin Lys Lys Pro Asp Gly Cys Ser Glu Cys Gly Gly Ser Phe 
275 280 285 



Thr Gin Lys Ser His Leu Phe Ala Gin Gin Arg He His Ser Val Gly 
290 295 300 
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Asn Lai His Glu Cys Gly Lys Cys Gly Lys Ala Phe Met Pro Gin Leu 
305 310 315 320 

Lys Leu Ser Val Tyr Leu Thr Asp His Thr Gly Asp lie Pro Cys lie 
325 330 335 

Cys Lys Glu Cys Gly Lys Val Phe lie Gin Arg Ser Glu Leu Leu Thr 
340 345 350 

His Gin Lys Thr His Thr Arg Lys Lys Pro Tyr Lys Cys His Asp Cys 
355 360 365 

Gly Lys Ala Phe Phe Gin Met Leu Ser Leu Phe Arg His Gin Arg Thr 
370 375 380 

His Ser Arg Glu Lys Leu Tyr Glu Cys Ser Glu Cys Gly Lys Gly Phe 
385 390 395 400 

Ser Gin Asn Ser Thr Leu lie lie His Gin Lys lie His Thr Gly Glu 
405 410 415 

Arg Gin Tyr Ala Cys Ser Glu Cys Gly Lys Ala Phe Thr Gin Lys Ser 
420 425 430 

Thr Leu Ser Leu His Gin Arg lie His Ser Gly Gin Lys Ser Tyr Val 
435 440 445 

Cys lie Glu Cys Gly Gin Ala Phe He Gin Lys Ala His Leu He Val 
450 455 460 

His Gin Arg Ser His Thr Gly Glu Lys Pro Tyr Gin Cys His Asn Cys 
465 470 475 480 

Gly Lys Ser Phe He Ser Lys Ser Gin Leu Asp lie His His Arg He 
485 490 495 

His Thr Gly Glu Lys Pro Tyr Glu Cys Ser Asp Cys Gly Lys Thr Phe 
500 505 510 

Thr Gin Lys Ser His Leu Asn He His Gin Lys lie His Thr Gly Glu 
515 520 525 

Arg His His Val Cys Ser Glu Cys Gly Lys Ala Phe Asn Gin Lys Ser 
530 535 540 

He Leu Ser Met His Gin Arg He His Thr Gly Glu Lys Pro Tyr Lys 
545 550 555 560 



Cys Ser Glu Cys Gly Lys Ala Phe Thr Ser Lys Ser Gin Phe Lys Glu 
565 570 575 
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His Gin Arg He His Thr Gly Glu Lys Pro Tyr Val Cys Thr Glu Cys 
580 585 590 

Gly Lys Ala Phe Asn Gly Arg Ser Asn Phe His Lys His Gin He Thr 
595 600 605 

His Thr Arg Glu Arg Pro Fhe Val Cys Tyr Lys Cys Gly Lys Ala Phe 
610 615 620 

Val Gin Lys Ser Glu Usxx He Thr His Gin Arg Thr His Met Gly Glu 
625 630 635 640 

Lys Pro Tyr Glu Cys Leu Asp Cys Gly Lys Ser Phe Ser Lys Lys Pro 
75 645 650 655 

Gin Leu Lys Val His Gin Arg He His Thr Gly Glu Arg Pro Tyr Val 
660 665 670 



10 



20 



25 



30 



40 



45 



Cys Ser Glu Cys Gly Lys Ala Phe Asn Asn Arg Ser Asn Fhe Asn Lys 
675 680 685 

His Gin Thr Thr His Thr Arg Asp Lys Ser Tyr Lys Cys Ser Tyr Ser 
690 695 700 

Val Lys Gly Phe Thr Lys Gin 
705 710 



(2) INFORMATION FOR SEQ ID NO: 11: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2133 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
35 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA( genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 



50 



ATGCCTGCTG ATOIGAATTT ATOOCAGAAG CCTCAGGTOC TGGGTOCAGA GAAGCAGGAT 60 

QGA TC TT Q 0 G AGGCATCAGT GTCATTTGAG GAOGTGADOG TGGACTTCAG CAGGGAGGAG 120 

TGGCAGCAAC TGGACCCTGC CCAGAGATGC CTGTACOGGG ATCTGATGCT GGAGCTCTAT 180 

AGOCATCTCT TOGCAGTGGG GTATCACATT 00CAAC0CAG AGGTCATCTT CAGAATGCTA 240 

AAAGAAAAGG AGCCGOGTGT GGAGGAGGCT GAAGTCTCAC ATCAGAGGTG TCAAGAAAGG 300 
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GACTTTGGGC TTGAAATCOC ACAAAAGGAG ATTTCTAAGA AAGCTTCATT TCAAAAGGAT 360 

ATGGTAGGTG AGTTCACAAG AGATGGTTCA TGCTCTTCCA TTTTAGAAGA ACTGAGGCTG 420 

5 

GATGCTGAOC GCACAAAGAA AGATGAGCAA AATCAAATTC AACCCATGAG TCACAGTGCT 480 

TTCTTCAACA AGAAAACATT GAACACAGAA AGCAATTOTG AATATAAGGA COCTGGGAAA 540 

10 ATGATTOGCA OGAGGOOOCA CCTTGCTTCT TCACAGAAAC AAOCTCAGAA ATGTTGCTTA 600 

TTTACAGAAA G7ITTGAAGCT GAAOCTAGAA GTGAACGGTC AGAATGAAAG CAATGACACA 660 

GAACAGCTTG ATGAOGTTGT TGGGTCTGGT CAGCTATTCA GOCATAGCTC TTCTGATGOC 720 

15 

TGCAGCAAGA ATATTCATAC AGGAGAGACA TTTTGCAAAG GTAAOCAGTC TAGAAAAGTC 780 

TGTOGOCATA AACAGTCACT CAAGCAACAT CAAATTCATA CTCAGAAGAA AOCAGATGGA 840 

20 TGTTCTGAAT GTGGGGGGAG CITCAOOCAG AAOTCACAOC TCTTTGOCCA ACAGAGAATT 900 

CATACTCTAG GAAAOCTOCA TGAATGTGGC AAATGTGGAA AAGOCTTCAT GOCACAACTA 960 

AAACTCACTG TATATCTGAC AGATCATACA GGTGATATAC OCTGTATATG CAAGGAATCT 1020 

25 

GQGAAGCTCT TTATTCAGAG ATCAGAATTG CTTAGGCAOC AGAAAACACA CACTAGAAAG 1080 

AAGOOCTATA AATOOCATGA CTOTGGAAAA GOCTTTITOC AGATOTTATC TCTCTTCAGA 1140 

CATCAGAGAA CTCACAGTAG AGAAAAACTC TATGAATGCA GTGAATGTGG CAAAGGCTTC 1200 

30 

TOOCAAAACT OttOOCTCAT TATACATCAG AAAATTCATA CTGGTGAGAG ACAGTATGCA 1260 

TGCAGTCAAT CTQGGAAAGC CITTAOOCAG AAGTCAACAC TCAGCTTGCA OCAGAGAATC 1320 

35 CACICAGGGC AGAAGTCCTA TGTOTCTATC GAATGOGGGC AGGOCTTCAT OCAGAAGGCA 1380 

CAOCTGATTG TOCATCAAAG AAGOCACACA GGAGAAAAAC CITATCAGTG OCACAACTGT 1440 

GQGAAATOCT TCATTTOCAA GTCACAGCTT GATATACATC ATOGAATTCA TACAGGGGAG 1500 

40 

AAAOCTTATC AATGCAGTGA CTOTGGAAAA Aa^PTCACXX AAAAGTCACA CCTGAATATA 1560 

CAOCAGAAAA TTCATACTGG AGAAAGACAC CATCTATGCA GTGAATGOGG GAAAGOCTTC 1620 

45 AAOCAGAAGT CAATACTCAG CATGCATCAG AGAATTCACA OOGGAGAGAA GCCTTACAAA 1680 

TGCACTGAAT GTGGGAAAGC CTTCACTTCT AAGTCTCAAT TCAAAGAGCA TCAGOGAATT 1740 

CACAOGGGTG AGAAAOOCTA TGTCTGCACT GAATOTGGGA AGGCCTTCAA OGGCAGGTCA 1800 

50 
55 
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AATTTOCATA AACATCAAAT AACTCACACT AGAGAGAGGC CTTTTGTCTG TTACAAATGT 1860 

GGGAAGGCTT TTGTOCAGAA ATCAGAGTTG ATTAGCCATC AAAGAACTCA CATGGGAGAG 1920 

5 

AAAOOCTATO AATGOCTTGA CTGTGGGAAA TOGTTCACTA AGAAAGCACA ACTCAAGGTG 1980 

CATCAGOGAA TTCACAOQGG AGAAAGADCT TATGTGTOTT CTGAATGTGG AAAGQOCTTC 2040 

10 AACAACAGGT CAAACTTCAA TAAACAOCAA ACAACTCATA OCAGAGACAA ATCTTACAAA 2100 

TGCACTTATT CTGTGAAAGG CTTTAOCAAG CAA 2133 

75 (2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3754 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

( D ) TOPOLOGY : linear 

(ii) MOLECULE TYPE: DNA ( genomic ) 
(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 



20 



25 



(vii) IMMEDIATE SOURCE: 

(A) LIBRARY: Human fetal brain cDNA library 
so (B) CLONE: GEN-076C09 

(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 346.. 2478 



35 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 



GCTAAGCCTA TOTOGCTTAC TGGAOGCTGA AGTGATTOGG AATATTAGCA GTGGGGCTTC 60 

40 TGTAGGGTCA GGAAGGGGOG GCTOGCTTTC GGGGAGTGAT GAGGQGCTTG TTGGGGGTOG 120 

GGGTGOGTGA TAAAGGGATT TCTOGGCTGA AGACGAGGCT GTGAGGCTTC TQCAGAACOC 180 

CCAGGTCAGG CCACATCATT GAGGCTGCAG GATCTCTCTT CATAGCOCAG TACGACTCTC 240 

45 

CGOOCTCTCC CTGCTTGGAA AATCCAAACA CCTATOCAGC TTCTGGCTOC TGGGAAAACT 300 

GGAGTTGTCA GCAAGAGAGA CCGAGAGTAG AAGCCCAGAG TGGAG ATG OCT GCT 354 

Met Pro Ala 

50 
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GAT CTG AAT TTA TCC CAG AAG OCT CAG GTC CTG GGT CCA GAG AAG CAG 
Asp Val Asn Leu Ser Gin Lys Pro Gin Val Leu Gly Pro Glu Lys Gin 
5 10 15 

GAT GGA TCT TGC GAG GCA TCA GTG TCA TTT GAG GAC GTG ADC CTG GAC 
Asp Gly Ser Cys Glu Ala Ser Val Ser Fhe Glu Asp Val Thr Val Asp 
20 25 30 35 



402 



450 



75 



20 



25 



30 



35 



40 



45 



TTC AGC AGG GAG GAG TOG CAG CAA CTG GAC OCT GOC CAG AGA TGC CTG 498 
Phe Ser Arg Glu Glu Trp Gin Gin Leu Asp Pro Ala Gin Arg Cys Leu 
40 45 50 

TAG CGG GAT GTG ATG CTG GAG CTC TAT AGC CAT CPC TTC GCA CTG GGG 546 
Tyr Arg Asp Val Met Leu Glu Leu Tyr Ser His Leu Fhe Ala Val Gly 
55 60 65 



TAT CAC ATT COC AAC CCA GAG GTC ATC TTC AGA ATG CTA AAA GAA AAG 
Tyr His lie Pro Asn Pro Glu Val He Phe Arg Met Leu Lys Glu Lys 
70 75 80 



594 



GAG COG COT GTG GAG GAG OCT GAA GTC TCA CAT CAG AGG TGT CAA GAA 642 
Glu Pro Arg Val Glu Glu Ala Glu Val Ser His Gin Arg Cys Gin Glu 
85 90 95 

AGG GAG TTT GGG CTT GAA ATC OCA CAA AAG GAG ATT TCT AAG AAA GCT 690 
Arg Glu Phe Gly Leu Glu He Pro Gin Lys Glu He Ser Lys Lys Ala 
100 105 110 115 

TCA TIT CAA AAG GAT ATG GTA GCT GAG TTC ACA AGA GAT GCT TCA TOG 738 
Ser Fhe Gin Lys Asp Met Val Gly Glu Phe Thr Arg Asp Gly Ser Trp 
120 125 130 

TCT TCC ATT TTA GAA GAA CTG AGG CTG GAT GCT GAC OGC ACA AAG AAA 786 
Cys Ser He Leu Glu Glu Leu Arg Leu Asp Ala Asp Arg Thr Lys Lys 
135 140 145 

GAT GAG CAA AAT CAA ATT CAA COC ATG ACT CAC ACT GCT TTC TTC AAC 834 
Asp Glu Gin Asn Gin He Gin Pro Met Ser His Ser Ala Fhe Fhe Asn 
150 155 160 

AAG AAA ACA TTC AAC ACA GAA AGC AAT TCT GAA TAT AAG GAC OCT GGG 882 
Lys Lys Thr Leu Asn Thr Glu Ser Asn Cys Glu Tyr Lys Asp Pro Gly 
165 170 175 

AAA ATG ATT OGC A0G AGG O0C CAC CTT GCT TCT TCA CAG AAA CAA OCT 930 
Lys Met lie Arg Thr Arg Pro His Leu Ala Ser Ser Gin Lys Gin Pro 
180 185 190 195 



50 



55 
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CAG AAA TOT TQC TTA TTT ACA GAA ACT TTG AAG CTG AAC CTA GAA GTG 
Gin Lys Cys Cys Leu Phe Thr Glu Ser Leu Lys Leu Asn Leu Glu Val 
200 205 210 

AAC GOT CAG AAT GAA AGC AAT GAC ACA GAA CAG CTT GAT GAC GTT GTT 
Asn Gly Gin Asn Glu Ser Asn Asp Thr Glu Gin Leu Asp Asp Val Val 
215 220 225 

GGG TCT GOT CAG CTA TTC AGC CAT AGC TCT TCT GAT GCC TGC AGC AAG 
Gly Ser Gly Gin Leu Phe Ser His Ser Ser Ser Asp Ala Cys Ser Lys 
230 235 240 

AAT ATT CAT ACA GGA GAG ACA TTT TQC AAA GOT AAC CAG TOT AGA AAA 
Asn lie His Thr Gly Glu Thr Phe Cys Lys Gly Asn Gin Cys Arg Lys 
245 250 255 

GTC TOT GGC CAT AAA CAG TCA CTC AAG CAA CAT CAA ATT CAT ACT CAG 
Val Cys Gly His Lys Gin Ser Leu Lys Gin His Gin lie His Thr Gin 
260 265 270 275 

AAG AAA OCA GAT GGA TCT TCT GAA TCT GGG GGG AGC TTC ACC CAG AAG 
Lys Lys Pro Asp Gly Cys Ser Glu Cys Gly Gly Ser Phe Thr Gin Lys 
280 285 290 

TCA CAC CTC TTT GOC CAA CAG AGA ATT CAT ACT CTA GGA AAC CTC CAT 
Ser His Lai Phe Ala Gin Gin Arg lie His Ser Val Gly Asn Leu His 
295 300 305 

GAA TCT GGC AAA TCT GGA AAA GOC TTC ATG OCA CAA CTA AAA CTC ACT 
Glu Cys Gly Lys Cys Gly Lys Ala Phe Met Pro Gin Leu Lys Lai Ser 
310 315 320 

CTA TAT CTG ACA GAT CAT ACA GCT GAT ATA GOC TCT ATA TGC AAG GAA 
Val Tyr Leu Thr Asp His Thr Gly Asp lie Pro Cys lie Cys Lys Glu 
325 330 335 

TCT GGG AAG GTC TTT ATT CAG AGA TCA GAA TTG CTT AOG CAC CAG AAA 
Cys Gly Lys Val Phe He Gin Arg Ser Glu Leu Leu Thr His Gin I#s 
340 345 350 355 

ACA CAC ACT AGA AAG AAG OOC TAT AAA TGC CAT GAC TCT GGA AAA GOC 
Thr His Thr Arg Lys Lys Pro Tyr Lys Cys His Asp Cys Gly Lys Ala 
360 365 370 

TTT TTC CAG ATG TTA TCT CTC TTC AGA CAT CAG AGA ACT CAC ACT AGA 
Phe Phe Gin Met Leu Ser Leu Phe Arg His Gin Arg Thr His Ser Arg 
375 380 385 

GAA AAA CTC TAT GAA TGC ACT GAA TCT GGC AAA GGC TTC TOC CAA AAC 
Glu Lys Leu Tyr Glu Cys Ser Glu Cys Gly Lys Gly Phe Ser Gin Asn 
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390 395 400 

TCA AOC CTC ATT ATA CAT CAG AAA ATT CAT ACT GGT GAG AGA CAG TAT 1602 
5 Ser Thr Leu He He His Gin Lys He His Thr Gly Glu Arg Gin Tyr 

405 410 415 

GCA TGC ACT GAA TGT GGG AAA GOC TTT AOC CAG AAG TCA ACA CTC AGC 1650 
Ala Cys Ser Glu Cys Gly Lys Ala Phe Thr Gin Lys Ser Thr Leu Ser 
10 420 425 430 435 

TTG CAC CAG AGA ATC CAC TCA GGG CAG AAG TOC TAT GTG TGT ATC GAA 1698 
Leu His Gin Arg He His Ser Gly Gin Lys Ser Tyr Val Cys He Glu 
440 445 450 

75 

TGC GGG CAG GOC TIC ATC CAG AAG GCA CAC CTG ATT GTC CAT CAA AGA 1746 
Cys Gly Gin Ala Phe He Gin Lys Ala His Leu He Val His Gin Arg 
455 460 465 

20 AGC CAC ACA GGA GAA AAA OCT TAT CAG TGC CAC AAC TGT GGG AAA TOC 1794 

Ser His Thr Gly Glu Lys Pro Tyr Gin Cys His Asn Cys Gly Lys Ser 
470 475 480 



25 



30 



TTC ATT TOC AAG TCA CAG CTT GAT ATA CAT CAT OGA ATT CAT ACA GGG 1842 
Phe lie Ser Lys Ser Gin Lai Asp He His His Arg lie His Thr Gly 
485 490 495 

GAG AAA OCT TAT GAA TGC ACT GAC TGT GGA AAA AOC TTC ADC CAA AAG 1890 
Glu Lys Pro Tyr Glu Cys Ser Asp Cys Gly Lys Thr Phe Thr Gin Lys 
500 505 510 515 

TCA CAC CTG AAT ATA CAC CAG AAA ATT CAT ACT GGA GAA AGA CAC CAT 1938 
Ser His Leu Asn He His Gin Lys lie His Thr Gly Glu Arg His His 
520 525 530 

35 CTA TGC ACT GAA TGC GGG AAA GOC TTC AAC CAG AAG TCA ATA CTC AGC 1986 

Val Cys Ser Glu Cys Gly Lys Ala Phe Asn Gin Lys Ser He Leu Ser 
535 540 545 

ATG CAT CAG AGA ATT CAC AOC GGA GAG AAG OCT TAC AAA TGC ACT GAA 2034 
40 Met His Gin Arg He His Thr Gly Glu Lys Pro Tyr Lys Cys Ser Glu 
550 555 560 

TOT GGG AAA GOC TTC ACT TCT AAG TCT CAA TTC AAA GAG CAT CAG OGA 2082 
Cys Gly Lys Ala Phe Thr Ser Lys Ser Gin Phe Lys Glu His Gin Arg 
565 570 575 



45 



ATT CAC AOG OCT GAG AAA OOC TAT CTG TGC ACT GAA TCT GGG AAG GOC 2130 
He His Thr Gly Glu Lys Pro Tyr Val Cys Thr Glu Cys Gly Lys Ala 
580 585 590 595 



50 
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25 



30 



35 



40 



TTC AAC GGC AGG TCA AAT TTC CAT AAA CAT CAA ATA ACT CAC ACT AGA 2178 
Phe Asn Gly Arg Ser Asn Phe His Lys His Gin He Thr His Thr Arg 
600 605 610 

GAG AGG OCT TTT GTC TGT TAC AAA TOT GGG AAG GCT TTT GTC CAG AAA 2226 
Glu Arg Pro Phe Val Cys Tyr Lys Cys Gly Lys Ala Phe Val Gin Lys 
615 620 625 

TCA GAG TTG ATT ACC CAT CAA AGA ACT CAC ATG GGA GAG AAA COC TAT 2274 
Ser Glu Leu He Thr' His Gin Arg Bur His Met Gly Glu Lys Pro Tyr 
630 635 640 

GAA TGC CTT GAC TGT GGG AAA TOG TTC ACT AAG AAA CCA CAA CTC AAG 2322 
Glu Cys Leu Asp Cys Gly Lys Ser Phe Ser Lys Lys Pro Gin Leu Lys 
645 650 655 



GTG CAT CAG GGA ATT CAC AOG GGA GAA AGA OCT TAT GTG TOT TOT GAA 2370 
Val His Gin Arg He His Thr Gly Glu Arg Pro Tyr Val Cys Ser Glu 
20 660 665 670 675 

TOT GGA AAG GCC TTC AAC AAC AGG TCA AAC TTC AAT AAA CAC CAA ACA 2418 
Cys Gly Lys Ala Phe Asn Asn Arg Ser Asn Phe Asn Lys His Gin Thr 
680 " 685 690 



ACT CAT ACC AGA GAC AAA TCT TAC AAA TGC ACT TAT TCT GTG AAA GGC 2466 
Thr His Thr Arg Asp Lys Ser Tyr Lys Cys Ser Tyr Ser Val Lys Gly 
695 700 705 

TTT ACC AAG CAA TGAATTOCTA CTGCATCAGC ATATTCATAA ATGAAATATA 2518 
Phe Thr Lys Gin 
710 



CTCOGAGTrr CTTGAAGAAG AGAACATCTT CTCAGAATCA GGTCTAATTA TATGTTATTG 2578 

AATTCATGCT TCAGAAAAAC TCTAGGGATG CACTGCATGT GTGAACACAT GATAAAAAAG 2638 

TCATGCTTTA TTTTAGTGAG GGCAATTACA GAGAAAAGAG TAAGCAGAAA TGTOCTTCTG 2698 

ACTACTGGOC TCATTAAGGA TTATAAATTT TCTOOOOGQG AAGAAAOOCT GACTAACGCA 2758 

TTGAGAAAAG CCTTTCTGTA AAGAATGCTA CAAGACAGCT TGTTACTOGA TTATTTATAG 2818 

TAAAATATGT GGGAAATTAT ATCAATGATA AOOCTGTTTA TTCTGGGATA TCAATATTTT 2878 

45 TAAACTGOCA ACACAGTCAT GATAGGACAA TATTTTATGT GTCTGTGTGC GOCTTATGTA 2938 

TATAAGCATA TATATAATAT ATAAGCATAT TATTATATAC AGGTTCAGTA T00CTTCT0C 2998 

AAAATGOCTG GGATCAGAAG CATTTTGGAT TTCAGATACT TACAGATTTT GGAATATTTG 3058 

50 
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CATTATATTT ATTGGTTGAG CATOOCTAAT CTGAAAATOC AAGATTAAAT GCTOCAATTA 3118 

GCATTTOCTT TGAGOGTCAT GTTAGAGTTC AAAAACTTTC AGATTTTGGG TTTPCAGATT 3178 

AGGAATACOC AAOCTGTATG TACGTATATT TCTGTATCTA TGTATGTATA TATATGCATA 3238 

TGCAGACATA TGTATATGGT CTGGTCAGCA TATCTGTATG TATQOGTATG TATGTATCTA 3298 

TGTATGOOCT CAGTGCAGTG GGGTTTGCTG CAGAATTCAC TGCATAGCAG GAGATGTAAG 3358 

GAGATGACTT ATTTTTTAAG AGAATCTAAT CTAATTGTTT TTATAAAAAT TATTOOCTAT 3418 

TGAATATTTA TATAATGAGG TTCTATCAAC AATGATTAAC TOCTTTATTA TACATACACA 3478 

TGAATCTGCA TTTTTGC5TAA ATOCATAAAT GAGATTCTAT AATCTTTACT GATCTTTATA 3538 

TTACAGATTT TCTCTTCTTT TAGGATTAGC TCAGCTTGOC OOOOCTTPOC ATCTOCADCA 3598 

20 TCTATAGTGA GCX^CTOCAT AATTAGTGCC AAOCATTAGT CTOGITCATA TTTTTACAOC 3658 

AGGAGTCAAC AAACTCTGOC ATTGGOCAAA TATOGOCTOC CAACTGTTTT TTTAAAATAA 3718 

ACTTTTATTG GAACACAAAA AAAAAAAAAA AAAAAA 3754 



15 



25 



30 



35 



40 



(2) INFOfWATION FOR SBQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENCTH: 389 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) 1VDLECLJLE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 



Met Ala Asp Pro Arg Asp Lys Ala Leu Gin Asp Tyr Arg Lys Lys Leu 
1 5 10 15 

Leu Glu His Lys Glu lie Asp Gly Arg Leu Lys Glu Leu Arg Glu Gin 
20 25 30 

Leu Lys Glu Leu Thr Lys Gin Tyr Glu Lys Ser Glu Asn Asp Leu Lys 
45 35 40 45 

Ala Leu Gin Ser Val Gly Gin He Val Gly Glu Val Leu Lys Gin Leu 
50 55 60 

so Thr Glu Glu Lys Phe He Val Lys Ala Thr Asn Gly Pro Arg Tyr Val 
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65 



70 



75 



80 



Val Gly Cys Arg Arg Gin Leu Asp Lys Ser Lys Leu Lys Pro Gly Thr 
85 90 95 

Arg Val Ala Leu Asp Met Thr Thr Leu Thr He Met Arg Tyr Leu Pro 
100 105 110 

Arg Glu Val Asp Pro Leu Val Tyr Asn Met Ser His Glu Asp Pro Gly 
115 120 125 

Asn Val Ser Tyr Ser Glu He Gly Gly Leu Ser Glu Gin He Arg Glu 
130 135 140 

Leu Arg Glu Val He Glu Leu Pro Leu Thr Asn Pro Glu Leu Phe Gin 
145 ~ 150 155 160 

Arg Val Gly He He Pro Pro Lys Gly Cys Leu Leu Tyr Gly Pro Pro 
165 * 170 175 

Gly Thr Gly Lys Thr Leu Leu Ala Arg Ala Val Ala Ser Gin Leu Asp 
180 185 190 

Cys Asn Phe Leu Lys Val Val Ser Ser Ser He Val Asp Lys Tyr He 
195 200 205 

Gly Glu Ser Ala Arg Lai lie Arg Glu Met Phe Asn Tyr Ala Arg Asp 
210 215 220 

His Gin Pro Cys He He Phe Met Asp Glu He Asp Ala He Gly Gly 
225 230 235 240 

Arg Arg Phe Ser Glu Gly Thr Ser Ala Asp Arg Glu He Gin Arg Thr 
245 250 255 

Leu Met Glu Leu Leu Asn Gin Met Asp Gly Phe Asp Thr Leu His Arg 
260 265 270 

Val Lys Met Thr Met Ala Thr Asn Arg Pro Asp Thr Leu Asp Pro Ala 
275 280 ~ 285 

Leu Leu Arg Pro Gly Arg Leu Asp Arg Lys He His He Asp Leu Pro 
290 295 300 

Asn Glu Gin Ala Arg Lai Asp He Leu Lys He His Ala Gly Pro He 
305 310 315 320 



Thr Lys His Gly Glu He Asp Tyr Glu Ala He Val Lys Leu Ser Asp 

325 330 335 
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Gly Phe Asn Gly Ala Asp Leu Arg Asn Val Cys Thr Glu Ala Gly Met 
340 345 350 

Phe Ala He Arg Ala Asp His Asp Phe Val Val Gin Glu Asp Phe Met 
355 360 365 

Lys Ala Val Arg Lys Val Ala Asp Ser Lys Lys Leu Glu Ser Lys Leu 
370 375 380 

Asp Tyr Lys Pro Val 
385 • 



(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1167 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA( genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 



ATGQCGGACC 
GAGATOGAOG 
GAAAAGTCTG 
CTTAAACAGT 
CTGGGCTTCTC 
GATATGACTA 
AACATGTCTC 
CAGATOOGGG 
CGTGTAGGAA 
ACACTCTTGG 
AGTTCTATTG 
TATGCTAGAG 



CTAGAGATAA 
GCOGTCTTAA 
AAAATGATCT 
TAACTGAAGA 
GTOGACAGCT 
CACTAACTAT 
ATGAGGACCC 
AATTAAGAGA 
TAATACCTOC 
CAGGAGCOGT 
TAGACAAGTA 
ATCATCAACC 



GGOGCTTCAG 
GGAGTTAAGG 
GAAGGOOCTA 
AAAATTCATT 
TGACAAAACT 
CATGAGATAT 
TGGGAATOTT 
GCTGATAGAA 
AAAAGGCTGT 
TGCTAGCCAG 
CATTGGTGAA 
ATGCATCATT 



GACTAOCGCA 
GAACAATTAA 
CAGAGTGTTG 
CTTAAAGCTA 
AA0CPGAAGC 
TTGOCGAGAG 
TCTTATTCTG 
TTACCrCTTA 
TTGTTATATG 
CTGGACTGCA 
AGTGCTOGTT 
TTTATGGATG 



AGAACTTGCT 
AAGAACTTAC 
GGCAGATOGT 
OCAATGGAOC 
CAGGAACAAG 
AGGTOGATCC 
AGATTQGAGG 
CAAACOCAGA 
GACCACCAGG 
ATTTCTTAAA 
TGATCAGAGA 
AAATAGATGC 



TGAACACAAG 
CAAGCACTAT 
GGGTGAAGTC 
AAGATATGTT 
AGTTGCTTTG 
ACTGOTTTAT 
GCTATCAGAA 
CTTATTTCAG 
TAOGGGAAAA 
GGTTGTATCT 
AATGTTTAAT 
TATTGGTGGT 
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15 



20 



25 



45 



OTPOGGTTTT CTGAGGGTAC TTCAGCTGAC AGAGAGATTC AGAGAAOGTT AATGGAGTTA 780 

CTGAATCAAA TGGATGGATT TGATACTCTG CATAGAGTTA AAATGAOCAT GGCTACAAAC 840 

AGACCAGATA CACTGGATOC TGCTTTGCTG OGTOCAGGAA GATTAGATAG AAAAATACAT 900 

AOTGATTTOC CAAATGAACA AGCAAGATTA GACATACTCA AAATOCATGC AGCTCOCATT 960 

ACAAAGCATG GTOAAATAGA TTATGAAGCA ATTOTGAAGC TTTOQGATOG CTTTAATGGA 1020 

GCAGATCTGA GAAATGTTTG TACTGAAGCA GCTATGTTOG CAATTOGTGC TGATCATGAT 1080 

TTTGTACTAC AQGAAGACTT CATGAAAGCA G7TCAGAAAAG TGGCTGATTC TAAGAAGCTG 1140 

GACTCTAAAT TGGACTACAA AOCTGTG 1167 

(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1566 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: cDNA 
(iii) HYPOTHETICAL: NO 
30 (iv) ANTI -SENSE: NO 

(vii) IMMEDIATE SOURCE: 

(A) LIBRARY: Human fetal brain cDNA library 

(B) CLONE: GEN-331G07 

35 

(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 17.. 1183 

40 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 

GAGACGGCTT CTCATC ATG GOG GAG OCT AGA GAT AAG GOG CTT CAG GAC 49 
Met Ala Asp Pro Arg Asp Lys Ala Leu Gin Asp 
1 5 10 

TAC CGC AAG AAG TTG CTT GAA CAC AAG GAG ATC GAC GGC CGT CTT AAG 97 
Tyr Arg Lys Lys Leu Leu Glu His Lys Glu lie Asp Gly Arg Leu Lys 
15 20 25 

50 
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GAG TTA AGG GAA CAA TTA AAA GAA CTT AOC AAG CAG TAT GAA AAG TCT 
Glu Leu Arg Glu Gin Leu Lys Glu Leu Thr Lys Gin Tyr Glu Lys Ser 
30 35 40 



145 



GAA AAT GAT CFG AAG GOC CTA CAG ACT GTT GGG CAG ATC GTG GOT GAA 
Glu Asn Asp Leu Lys Ala Leu Gin Ser Val Gly Gin lie Val Gly Glu 
45 50 55 



193 



10 GTG CTT AAA CAG TTA ACT GAA GAA AAA TTC ATT GTT AAA GCT AOC AAT 

Val Leu Lys Gin Leu Thr Glu Glu Lys Phe lie Val Lys Ala Ttir Asn 
60 65 70 75 



241 



GGA OCA AGA TAT GTT GTG GCT TCT OCT CGA CAG CTT GAC AAA ACT AAG 
is Gly Pro Arg Tyr Val Val Gly Cys Arg Arg Gin Leu Asp Lys Ser Lys 

80 85 90 



289 



20 



25 



CTG AAG CCA GGA ACA AGA GTT GCT TTG GAT ATG ACT ACA CTA ACT ATC 337 
Leu Lys Pro Gly Thr Arg Val Ala Leu Asp Met Thr Thr Leu Thr lie 
95 100 105 

ATG AGA TAT TTG COG AGA GAG GTG GAT CCA CTG GTT TAT AAC ATG TCT 385 
Met Arg Tyr Leu Pro Arg Glu Val Asp Pro Leu Val Tyr Asn Met Ser 
110 115 120 

CAT GAG GAC OCT GGG AAT GTT TCT TAT TCT GAG ATT GGA GGG CTA TCA 433 
His Glu Asp Pro Gly Asn Val Ser Tyr Ser Glu lie Gly Gly Leu Ser 
125 130 135 



30 



GAA CAG ATC CGG GAA TTA AGA GAG CTG ATA GAA TTA OCT CTT ACA AAC 
Glu Gin lie Arg Glu Leu Azg Glu Val lie Glu Leu Pro Leu Thr Asn 
140 145 150 155 



481 



35 



CCA GAG TTA TTT CAG OCT CTA GGA ATA ATA OCT OCA AAA GGC TOT TTC 
Pro Glu Leu Fbe Gin Arg Val Gly lie lie Pro Pro Lys Gly Cys Leu 
160 165 170 



529 



40 



TTA TAT GGA CCA CCA GCT AOG GGA AAA ACA CTC TTG GCA CGA GOC GTT 577 
Leu Tyr Gly Pro Pro Gly Thr Gly Lys Thr Leu Leu Ala Arg Ala Val 
175 180 185 

GCT AGC CAG CTG GAC TGC AAT TTC TTA AAG GTT CTA TCT ACT TCT ATT 625 
Ala Ser Gin Leu Asp Cys Asn Phe Leu Lys Val Val Ser Ser Ser lie 
190 195 200 



45 



CTA GAC AAG TAC ATT GCT GAA ACT GCT COT TTG ATC AGA GAA ATG TTT 
Val Asp Lys Tyr lie Gly Glu Ser Ala Arg Leu lie Arg Glu Met Phe 
205 210 215 



673 



50 



AAT TAT GCT AGA GAT CAT CAA CCA TGC ATC ATT TTT ATG GAT GAA ATA 
Asn Tyr Ala Arg Asp His Gin Pro Cys lie lie Phe Met Asp Glu lie 



721 
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220 



225 



230 



235 



GAT GCT ATT GCT GGT OGT COG TTT TCT GAG GOT ACT TCA GCT GAC AGA 
Asp Ala lie Gly Gly Arg Arg Phe Ser Glu Gly Thr Ser Ala Asp Arg 
240 245 250 



769 



70 



GAG ATT CAG AGA AOG TTA ATG GAG TTA CFG AAT CAA ATG GAT GGA TIT 
Glu He Gin Arg Thr Leu Met Glu Leu Leu Asn Gin Met Asp Gly Phe 
255 260 265 



817 



15 



GAT ACT CTG CAT AGA GTT AAA ATG AOC ATG GCT ACA AAC AGA CCA GAT 865 
Asp Thr Lai His Arg Val Lys Met Thr Met Ala Thr Asn Arg Pro Asp 
270 275 280 

ACA CTG GAT OCT GCT TTG CTG COT CCA GGA AGA TTA GAT AGA AAA ATA 913 
Thr Leu Asp Pro Ala Leu Leu Arg Pro Gly Arg Leu Asp Arg Lys He 
285 290 295 



CAT ATT GAT TTG CCA AAT GAA CAA GCA AGA TTA GAC ATA CTG AAA ATC 961 
His lie Asp Leu Pro Asn Glu Gin Ala Arg Leu Asp lie Lai Lys lie 
300 305 310 315 

CAT GCA GGT COC ATT ACA AAG CAT GCT GAA ATA GAT TAT GAA GCA ATT 1009 
His Ala Gly Pro He Thr Lys His Gly Glu He Asp Tyr Glu Ala He 
320 325 330 

GTG AAG CTT TOG GAT GGC TTT AAT GGA GCA GAT CTG AGA AAT GTT TOT 1057 
Val Lys Leu Ser Asp Gly Phe Asn Gly Ala Asp Lai Arg Asn Val Cys 
335 340 345 

ACT GAA GCA GCT ATG TTC GCA ATT OCT GCT GAT CAT GAT TTT GTA GTA 1105 
Thr Glu Ala Gly Met Phe Ala He Arg Ala Asp His Asp Fhe Val Val 
350 355 360 

CAG GAA GAC TTC ATG AAA GCA GTC AGA AAA GTG GCT GAT TCT AAG AAG 1153 
Gin Glu Asp Phe Met Lys Ala Val Arg Lys Val Ala Asp Ser Lys Lys 
365 370 375 



CTG GAG TCT AAA TTG GAC TAG AAA OCT GTG TAATTTACTG TAAGATTTTT 1203 
40 Leu Glu Ser Lys Leu Asp Tyr Lys Pro Val 
380 385 

GATQGCTGCA TGACAGATCT TGGCTTATTG TAAAAATAAA GTTAAAGAAA ATAATCTATG 1263 
45 TATTOQCAAT GATGTCATTA AAAGTATATG AATAAAAATA TGAOTAACAT CATAAAAATT 1323 
ACTAATTCAA CTTTTAAGAT ACAGAAGAAA TTTCTATGTT TCTTAAAGTT GCATTTATTG 1383 
CAGCAAGTTA CAAAGGGAAA GTGTTGAAGC TTTTCATATT TGCTGOCTGA GCATTITCTA 1443 
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20 



25 



30 



35 



40 



AAATATTGAA AGTGGTTTGA GATAGTGGTA TAAGAAAGCA TTTCTTATGA CTTATTTTGT 1503 
ATCATTTGTT TTOCTCATCT AAAAAGTTGA ATAAAATCTG TTTGATTCAG TTCTOCTAAA 1563 
AAA 1566 

(2) INTORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 223 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 



Met Ser Asp Glu Glu Ala Arg Gin Ser Gly Gly Ser Ser Gin Ala Gly 
1 5 10 15 

Val Val Thr Val Ser Asp Val Gin Glu Leu Met Arg Arg Lys Glu Glu 
20 25 30 

lie Glu Ala Gin lie Lys Ala Asn Tyr Asp Val Leu Glu Ser Gin Lys 
35 40 45 

Gly lie Gly Met Asn Glu Pro Leu Val Asp Cys Glu Gly Tyr Pro Arg 
50 55 60 

Ser Asp Val Asp Leu Tyr Gin Val Arg Thr Ala Arg His Asn lie lie 
65 70 75 80 

Cys Lai Gin Asn Asp His Lys Ala Val Met Lys Gin Val Glu Glu Ala 
85 90 95 

Leu His Gin Leu His Ala Arg Asp Lys Glu Lys Gin Ala Arg Asp Met 
100 ~ 105 110 

Ala Glu Ala His Lys Glu Ala Met Ser Arg Lys Leu Gly Gin Ser Glu 
115 120 ~ 125 

Ser Gin Gly Pro Pro Arg Ala Phe Ala Lys Val Asn Ser lie Ser Pro 
45 130 135 140 

Gly Ser Pro Ala Ser lie Ala Gly Leu Gin Val Asp Asp Glu lie Val 
145 150 155 160 

so Glu Phe Gly Ser Val Asn Thr Gin Asn Phe Gin Ser Leu His Asn lie 
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15 



25 



30 



35 



40 



45 



50 



55 



165 170 175 

Gly Ser Val Val Gin His Ser Glu Gly Lys Pro Leu Asn Val Thr Val 
180 185 190 

lie Arg Arg Gly Glu Lys. His Gin Leu Arg Leu Val Pro Thr Arg Trp 
195 200 205 

Ala Gly Lys Gly Leu Leu Gly Cys Asn lie lie Pro Leu Gin Arg 
210 215 220 



(2) INFORMATION FOR SEQ ID NO: 17: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 669 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
20 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA( genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 



ATGT00GAOG AGGAAGCGAG GCAGAGOGGA GGCT0CTCGC AGGC0GG0GT GCTGACTGTC 60 

AGOGACGTOC AGGAGCTGAT GOGQCGCAAG GAGGAGATAG AAGOGCAGAT CAAGGOCAAC 120 

TATGAGGTOC TGGAAAGOCA AAAAGGCATT GGGATGAAOG AGOOQCTQGT GGACTGTGAG 180 

GGCTAO00GC GGTCAGAOGT GGACCTGTAC CAACTOOGCA CCGCCAGQCA CAACATCATA 240 

TGOCTGCAGA ATGATCACAA GGCACTGATG AAGCAGGTGG AGGAGGOOCT GCACGAGCTG 300 

CAOGCT0G0G ACAAGGAGAA GCAGG000GG GACATGGCTG AGGOOCACAA AGAGGCCATG 360 

AGOOQCAAAC TGGGTCAGAG TGAGAGOCAG GG00CT0CAC GGGOCTTOGC CAAAGTGAAC 420 

AGCATCAGOC CX»3CTC00C AGCCAGCATC GOQGCTCTGC AAGTGGATGA TGAGATTOTG 480 

GAGTTOGGCT CTGTGAACAC GCAGAACTTC CAGTCACTGC ATAACATTGG CACTGTGGTG 540 

CAGCACAGTG AGGGGAAGOC CCTGAATGTG ACAGTGATOC GCAGGGGGGA AAAACACCAG 600 

CTTAGACTTG TTCCAACACG CTGGGCAGGA AAAGGACTGC TGGGCTGCAA CATTATTOCT 660 

CTGCAAAGA 669 
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(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 1128 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 
(iii) HYPOTHETICAL: NO 
(iv) ANTI -SENSE: NO 

(vii) IMMEDIATE SOURCE: 

(A) LIBRARY: Human fetal brain cDNA library 

(B) CLONE: GEN-163D09 

20 (ix) FEATURE: 

(A) NAME /KEY: CDS 

(B) LOCATION: 125.. 793 



15 



25 



30 



35 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 

ACTGTICTCG CGTTOGOGGA CQGCFGIGGT GTTTFGGCGC ATGGGCGGAG CGTAGTTACG 60 

CTXXACTGGG GOGTOCTOOC TAGOCOGGGA GOOGGGTCTC TGGAGTOGOG GOOGGGGCTT 120 

CAGG ATG TCC GAC GAG GAA GOG AGG CAG AGC GGA GGC TOC TOG CAG GOC 169 
Met Ser Asp Glu Glu Ala Arg Gin Ser Gly Gly Ser Ser Gin Ala 
15 10 15 

GGC GTC GTG ACT GTC AGC GAC GTC CAG GAG CTG ATG COG CGC AAG GAG 217 
Gly Val Val Thr Val Ser Asp Val Gin Glu Leu Met Arg Arg Lys Glu 
20 25 30 

GAG ATA GAA GOG CAG ATC AAG GOC AAC TAT GAC GTG CTG GAA AGC CAA 265 
Glu lie Glu Ala Gin lie Lys Ala Asn Tyr Asp Val Leu Glu Ser Gin 
35 40 45 

AAA GGC ATT GOG ATG AAC GAG COG CTG GTG GAC TGT GAG GGC TAC COC 313 
Lys Gly lie Gly Met Asn Glu Pro Leu Val Asp Cys Glu Gly Tyr Pro 
50 55 60 

CGG TCA GAC GTG GAC CTG TAC CAA GTC CGC ACC GOC AGG CAC AAC ATC 361 
Arg Ser Asp Val Asp Leu Tyr Gin Val Arg Thr Ala Arg His Asn lie 
65 70 75 

50 ATA TGC CTG CAG AAT GAT CAC AAG GCA GTG ATG AAG CAG CTG GAG GAG 409 
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45 
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20 



25 



lie Cys Leu Gin Asn Asp His Lys Ala Val Met Lys Gin Val Glu Glu 
80 85 90 95 

GOC CTG CAC CAG CTG CAC GCT OGC GAC AAG GAG AAG CAG GOC CGG GAC 457 
Ala Leu His Gin Leu His Ala Arg Asp Lys Glu Lys Gin Ala Arg Asp 
100 105 110 

ATG GCT GAG GOC CAC AAA GAG GOC ATG AGC OGC AAA CTG GGT CAG ACT 505 
Met Ala Glu Ala His Lys Glu Ala Met Ser Arg Lys Leu Gly Gin Ser 
115 120 * 125 



GAG AGC CAG GOC OCT CCA OGG GOC TTC GOC AAA GTG AAC AGC ATC AGC 553 
Glu Ser Gin Gly Pro Pro Arg Ala Phe Ala Lys Val Asn Ser lie Ser 
is 130 135 140 

COC GGC TOC CCA GOC AGC ATC GCG GGT CTG CAA GTG GAT GAT GAG ATT 601 
Pro Gly Ser Pro Ala Ser lie Ala Gly Leu Gin Val Asp Asp Glu lie 
145 150 155 



GTG GAG TTC GGC TCT GTG AAC AGC CAG AAC TTC CAG TCA CTG CAT AAC 649 
Val Glu Ite Gly Ser Val Asn Thr Gin Asn Phe Gin Ser Leu His Asn 
160 165 170 175 

ATT GGC ACT GTG GTG CAG CAC ACT GAG GGG AAG COC CTG AAT GTG ACA 697 
lie Gly Ser Val Val Gin His Ser Glu Gly Lys Pro Leu Asn Val Thr 
180 185 ^ 190 



GTG ATC OGC AGG GGG GAA AAA CAC CAG CTT AGA CTT GTT OCA ACA OQC 745 
Val lie Arg Arg Gly Glu Lys His Gin Leu Arg Leu Val Pro Thr Arg 
30 195 200 205 

TOG GCA GGA AAA GGA CTG CTG GGC TOC AAC ATT ATT OCT CTG CAA AGA 793 
Trp Ala Gly Lys Gly Leu Leu Gly Cys Asn lie lie Pro Leu Gin Arg 
210 215 220 

35 

TGATTGT00C TGGGGAACAG TAACAGGAAA GCATCTTC0C TTG00CTGGA CTTGQGTCTA 853 

GQGATTPOCA ACITGTCTTC TCTOOCTGAA GCATAAGGAT CTGGAAGAGG CTTCTAAOCT 913 

40 GAACTTCTGT GTGGTGGCAG TACTGTGGOC CAOCAGTCTA ATCTOOCTGG ATTAAGGCAT 973 

TCTTAAAAAC TTAGGCTTGG CCTCTTTCAC AAATTAGGOC ADGGOOCTAA ATAGGAATTC 1033 

OCTGGATTCT GGGCAAGTGG GOGGAAGTTA TTCTGGCAGG TACTGCTCTG ATTATTATTA 1093 

TTATTTITAA TAAAGACTTT TACAGTGCTG ATATG 1128 



(2) INFORMATION FOR SEQ ID N0:19: 
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(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 506 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 



Mat Ala Glu Ala Asp Phe Lys Met Val Ser Glu Pro Val Ala His Gly 
15 10 15 

Val Ala Glu Glu Glu Met Ala Ser Ser Thr Ser Asp Ser Gly Glu Glu 
20 25 30 

Ser Asp Ser Ser Ser Ser Ser Ser Ser Thr Ser Asp Ser Ser Ser Ser 
35 40 45 

Ser Ser Thr Ser Gly Ser Ser Ser Gly Ser Gly Ser Ser Ser Ser Ser 
50 55 60 

Ser Gly Ser Thr Ser Ser Arg Ser Arg Leu Tyr Arg Lys Lys Arg Val 
65 70 75 80 

Pro Glu Pro Ser Arg Arg Ala Arg Arg Ala Pro Leu Gly Thr Asn Phe 
85 90 95 

Val Asp Arg Leu Pro Gin Ala Val Arg Asn Arg Val Gin Ala Leu Arg 
100 105 110 

Asn lie Gin Asp Glu Cys Asp Lys Val Asp Thr Leu Phe Leu Lys Ala 
115 120 ~ 125 

lie His Asp Leu Glu Arg Lys Tyr Ala Glu Leu Asn Lys Pro Leu Tyr 
130 135 140 

Asp Arg Arg Phe Gin lie lie Asn Ala Glu Tyr Glu Pro Thr Glu Glu 
145 150 155 160 

Glu Cys Glu Tip Asn Ser Glu Asp Glu Glu Phe Ser Ser Asp Glu Glu 
165 170 175 

Val Gin Asp Asn Thr Pro Ser Glu Met Pro Pro Leu Glu Gly Glu Glu 
180 185 190 

Glu Glu Asn Pro Lys Glu Asn Pro Glu Val Lys Ala Glu Glu Lys Glu 
195 200 205 



Val Pro Lys Glu lie Pro Glu Val Lys Asp Glu Glu Lys Glu Val Ala 
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210 



215 



220 



Lys Glu lie Pro Glu Val Lys Ala Glu Glu Lys Ala Asp Ser Lys Asp 
225 230 235 240 

Cys Met Glu Ala Thr Pro Glu Val Lys Glu Asp Pro Lys Glu Val Pro 
245 250 255 

Gin Val Lys Ala Asp Asp Lys Glu Gin Pro Lys Ala Thr Glu Ala Lys 
260 265 270 

Ala Arg Ala Ala Val Arg Glu Thr His Lys Arg Val Pro Glu Glu Arg 
275 280 "285 

Leu Arg Asp Ser Val Asp Leu Lys Arg Ala Arg Lys Gly Lys Pro Lys 
290 295 ~ 300 

Arg Glu Asp Pro Lys Gly lie Pro Asp Tyr Trp Leu lie Val Leu Lys 
305 310 * 315 320 

Asn Val Asp Lys Leu Gly Pro Met lie Gin Lys Tyr Asp Glu Pro lie 
325 330 335 

Leu Lys Phe Leu Ser Asp Val Ser Leu Lys Phe Ser Lys Pro Gly Gin 
340 345 350 

Pro Val Ser Tyr Thr Phe Glu Phe His Phe Leu Pro Asn Pro Tyr Phe 
355 360 365 

Arg Asn Glu Val Leu Val Lys Thr Tyr lie lie Lys Ala Lys Pro Asp 
370 375 380 

His Asn Asp Pro Phe Phe Ser Trp Gly Trp Glu lie Glu Asp Cys Lys 
385 390 395 400 

Gly Cys Lys lie Asp Arg Arg Arg Gly Lys Asp Val Thr Val Thr Thr 
405 410 415 

Thr Gin Ser Arg Thr Thr Ala Thr Gly Glu lie Glu lie Gin Pro Arg 
420 425 430 

Val Val Pro Asn Ala Ser Phe Phe Asn Phe Phe Ser Pro Pro Glu lie 
435 440 445 

Pro Met lie Gly Lys Leu Glu Pro Arg Glu Asp Ala lie Leu Asp Glu 
450 455 * 460 



Asp Phe Glu lie Gly Gin lie Leu His Asp Asn Val lie Leu Lys Ser 
465 470 475 480 
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15 



20 



30 



35 



He Tyr Tyr Tyr Thr Gly Glu Val Asn Gly Thr Tyr Tyr Gin Phe Gly 
485 490 495 

Lys His Tyr Gly Asn Lys Lys Tyr Arg Lys 
500 505 



(2) INFORMATION FOR SEQ ID NO: 20: 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1518 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA( genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20: 



ATGGCAGAAG CAGATTITAA AATGGTCTOG GAAGCTGTOG COCATGGGGT TGOOGAAGAG 60 

GAGATGGCTA GCTOGACTAG TGATTCTGGG GAAGAATCTG ACAGCAGTAG CTCTAGCAGC 120 

25 AGCACTAGTG ACAGCAGCAG CAGCAGCAGC ACTAGTOGCA GCAGCAGOGG CAGOGGCAGC 180 

AGCAGCAGCA GCAGOGGCAG CACTAGCAGC CGCAGOCGCT TCTATAGAAA GAAGAGGGTA 240 

CCTGAGOCTT CCAGAAGGGC GOQGCGGGOC COGTTQGGAA CAAATTTOCT GGATAGGCTG 300 

CCTCAGQCAG TTAGAAATOG TGTGCAAGOG CTTAGAAACA TTCAAGATGA ATGTGACAAG 360 

GTAGATAOOC TGTTCTTAAA AGCAATTCAT GATCTTGAAA GAAAATATGC TGAACTCAAC 420 

AAGOCTCTGT ATGATAGGOG GTTTCAAATC ATCAATGCAG AATADGAGCC TACAGAAGAA 480 

GAATGTGAAT GGAATTCAGA GGATGAGGAG TTCAGCAGTO ATGAGGAGGT GCAGGATAAC 540 

AO00CTAGTG AAATGCCTCC CTTAGAGGGT GAGGAAGAAG AAAACOCTAA AGAAAACOCA 600 

40 GAGGTGAAAG CTGAAGAGAA GGAAGTTOCT AAAGAAATTC CTGAGGTGAA GGATGAAGAA 660 

AAQGAAGITG CTAAAGAAAT TOCTGAGOTA AAGGCFGAAG AAAAAGCAGA TTCTAAAGAC 720 

TGTATGGAGG CAACCCCTGA AGTAAAAGAA GATOCTAAAG AAGTCCOCCA GGTAAAGGCA 780 

45 

GATGATAAAG AACAGOCTAA AGCAACAGAG GCTAAGQCAA GGGCTGCAGT AAGAGAGACT 840 

CATAAAAGAG TTCCTGAGGA AAGGCTTCGG GACAGTGTAG ATCTTAAAAG AGCTAGGAAG 900 

50 
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15 



GGAAAQCCTA AAAGAGAAGA OOCTAAAGGC ATTOCTGACT ATTGGCTGAT TGTTTTAAAG 960 

AATOITCACA AGCTOGGGOC TATGATTCAG AAGTATGATG AGOOCATTCT GAAGTTCTTG 1020 

TOGGATGTTA GCCTGAACTT CTCAAAAOCT QQCXAGOCTG TAAGTTACAC CTTTGAATTf 1080 

CATTTICTAC OCAAOOCATA CTTCAGAAAT GAGGTGCTGG TGAAGACATA TAT AATAAAG 1140 

QCAAAACCAG ATCACAATGA TOOCTTCTIT TCTTGGQGAT GGGAAATTGA AGATTOCAAA 1200 

GGCTGCAAGA TAGAOOQGAG AAGAOGAAAA GATCTTACTG TGACAACTAC CCAGACTCOC 1260 

ACAACTGCTA CTGGAGAAAT TGAAATOCAG OCAAGAGTGG TTCCTAATGC ATCATTCTTC 1320 

AACTTCITTA CTOCTOCTGA GATTOCTATG ATTGGGAAGC TGGAAOCAOG AGAAGATGCT 1380 

ATOCTGGATG AGGACTTTGA AATTQQGCAG ATTTTACATG ATAATGTCAT OCTGAAATCA 1440 

20 ATCTATTACT ATACTGGAGA AGTCAATOCT AOCTACTATC AATTTGGCAA ACATTATGGA 1500 

AACAAGAAAT ACAGAAAA 1518 

25 (2) INFORMATION FOR SEQ ID NO: 21: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2636 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA( genomic) 
(iii) HYPOTHETICAL: NO 
(iv) ANTI -SENSE: NO 



30 



35 



(vii) IMMEDIATE SOURCE: 

(A) LIBRARY: Human fetal brain cDNA library 
40 (B) CLONE: GEN-078D05 

(ix) FEATURE : 

(A) NAME/KEY: CDS 

(B) LOCATION: 266.. 1783 



45 



50 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21: 
GATTOGQCTG CGGTACATCT CGGCACTCTA GCTGCAGOOG GGAGAGGOCT TGO0G0CAOC 60 
AAGCCTOCAC TGOOGCTGCC ADCTCAGOQC OGGC5CTCTGC ATCCCCAGCT 120 
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15 



30 



35 



40 



OCAGCTOOGC TCTGOGOOGC TGCTGCCATC GCCGCTGCCA CCTCCGCAGC COGGGOCTOC 180 

GOOGOOGOCA COCAAGCATC OGTGAGTCAT TTTCTGOCCA TCTCTGGTOG CGOGGTCTOC 240 

CTQGTAGAGT TTGTAGGCTT GCAAG ATG GCA GAA GCA GAT TTT AAA ATG GTC 292 

Met Ala Glu Ala Asp Phe Lys Met Val 
1 5 

TOG GAA OCT GTC GOC CAT GGG GIT GOC GAA GAG GAG ATG GCT AGC TOG 340 
Ser Glu Pro Val Ala His Gly Val Ala Glu Glu Glu Met Ala Ser Ser 
10 15 20 25 

ACT AGT GAT TCT GGG GAA GAA TCT GAC AGC AGT AGC TCT AGC AGC AGC 388 
Thr Ser Asp Ser Gly Glu Glu Ser Asp Ser Ser Ser Ser Ser Ser Ser 
30 35 40 



ACT AGT GAC AGC AGC AGC AGC AGC AGC ACT ACT GGC AGC AGC AGC GGC 436 
Thr Ser Asp Ser Ser Ser Ser Ser Ser Thr Ser Gly Ser Ser Ser Gly 
20 45 50 55 

AGC GGC AGC AGC AGC AGC AGC AGC GGC AGC ACT AGC AGC GGC AGC GGC 484 
Ser Gly Ser Ser Ser Ser Ser Ser Gly Ser Thr Ser Ser Arg Ser Arg 
60 65 70 

25 

TTG TAT AGA AAG AAG AGG GTA OCT GAG OCT TOC AGA AGG GOG OGG OGG 532 
Leu Tyr Arg Lys Lys Arg Val Pro Glu Pro Ser Arg Arg Ala Arg Arg 
75 80 85 

GOC COG TTG GGA ACA AAT TTC GTC GAT AGG CTG OCT CAG GCA GTT AGA 580 
Ala Pro Leu Gly Thr Asn Phe Val Asp Arg Leu Pro Gin Ala Val Arg 
90 95 100 105 

AAT OCT GTC GAA GOG CTT AGA AAC ATT CAA GAT GAA TCT GAC AAG GTA 628 
Asn Arg Val Gin Ala Leu Arg Asn lie Gin Asp Glu Cys Asp Lys Val 
110 115 ~ 120 

GAT AGC CTG TTC TTA AAA GCA ATT CAT GAT CTT GAA AGA AAA TAT GCT 676 
Asp Thr Leu Phe Leu Lys Ala lie His Asp Leu Glu Azg Lys Tyr Ala 
125 130 135 

GAA CTC AAC AAG OCT CTG TAT GAT AGG OGG TTT CAA ATC ATC AAT GCA 724 
Glu Leu Asn Lys Pro Leu Tyr Asp Arg Arg Phe Gin lie lie Asn Ala 
140 145 - 150 

45 GAA TAC GAG OCT ACA GAA GAA GAA TCT GAA TGG AAT TCA GAG GAT GAG 772 

Glu Tyr Glu Pro Thr Glu Glu Glu Cys Glu Trp Asn Ser Glu Asp Glu 
155 160 165 

GAG TTC AGC AGT GAT GAG GAG GTG CAG GAT AAC AOC OCT ACT GAA ATG 820 
so Glu Phe Ser Ser Asp Glu Glu Val Gin Asp Asn Thr Pro Ser Glu Met 
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170 



175 



180 



185 



OCT OOC TTA GAG GGT GAG GAA GAA GAA AAC OCT AAA GAA AAC OCA GAG 
Pro Pro Leu Glu Gly Glu Glu Glu Glu Asn Pro Lys Glu Asn Pro Glu 
190 195 200 



868 



10 



GIG AAA GCT GAA GAG AAG GAA GTT OCT AAA GAA ATT OCT GAG GTG AAG 
Val Lys Ala Glu Glu Lys Glu Val Pro Lys Glu lie Pro Glu Val Lys 
205 210 215 



916 



15 



GAT GAA GAA AAG GAA GTT GCT AAA GAA ATT OCT GAG GTA AAG GCT GAA 964 
Asp Glu Glu Lys Glu Val Ala Lys Glu lie Pro Glu Val Lys Ala Glu 
220 225 230 

GAA AAA GCA GAT TCT AAA GAG TGT ATG GAG OCA AOC OCT GAA GTA AAA 1012 
Glu Lys Ala Asp Ser Lys Asp Cys Met Glu Ala Thr Pro Glu Val Lys 
235 240 245 



20 GAA GAT OCT AAA GAA CTC OOC CAG GTA AAG GCA GAT GAT AAA GAA GAG 

Glu Asp Pro Lys Glu Val Pro Gin Val Lys Ala Asp Asp Lys Glu Gin 
250 255 260 265 



1060 



25 



OCT AAA GCA ACA GAG GCT AAG GCA AGG GCT GCA CTA AGA GAG ACT CAT 
Pro Lys Ala Thr Glu Ala Lys Ala Arg Ala Ala Val Arg Glu Thr His 
270 275 280 



1108 
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35 



AAA AGA GIT OCT GAG GAA AGG CTT OGG GAG AGT GTA GAT CTT AAA AGA 1156 
Lys Arg Val Pro Glu Glu Arg Leu Arg Asp Ser Val Asp Leu Lys Arg 
285 290 295 

GCT AGG AAG GGA AAG OCT AAA AGA GAA GAG OCT AAA GGC ATT OCT GAG 1204 
Ala Arg Lys Gly Lys Pro Lys Arg Glu Asp Pro Lys Gly lie Pro Asp 
300 305 310 



TAT TGG CTG ATT GTT TTA AAG AAT GTT GAG AAG CTC GGG OCT ATG ATT 
Tyr Trp Leu lie Val Leu Lys Asn Val Asp Lys Leu Gly Pro Met lie 
315 320 325 



1252 



40 



CAG AAG TAT GAT GAG OOC ATT CTG AAG TTC TTG TOG GAT GTT AGO CTG 
Gin Lys Tyr Asp Glu Pro lie Lai Lys Phe Leu Ser Asp Val Ser Leu 
330 335 340 345 



1300 



45 



AAG TTC TCA AAA OCT GGC CAG OCT GTA AGT TAG AOC TTT GAA TTT CAT 
Lys Phe Ser Lys Pro Gly Gin Pro Val Ser Tyr Thr Hie Glu Phe His 
350 355 360 



1348 



50 



TTT CTA OOC AAC OCA TAC TTC AGA AAT GAG GTG CTG GTG AAG ACA TAT 
Phe Leu Pro Asn Pro Tyr Phe Arg Asn Glu Val Leu Val Lys Thr Tyr 
365 370 375 
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ATA ATA AAG GCA AAA OCA GAT CAC AAT GAT CCC TTC TTT TCT TGG GGA 1444 
lie lie Lys Ala Lys Pro Asp His Asn Asp Pro Phe Phe Ser Trp Gly 
380 385 390 

0 

TGG GAA ATT GAA GAT TGC AAA GGC TGC AAG ATA GAC OGG AGA AGA GGA 1492 
Trp Glu lie Glu Asp Cys Lys Gly Cys Lys lie Asp Arg Arg Arg Gly 
395 400 405 

10 AAA GAT GTT ACT GTG ACA ACT ADC CAG AGT CGC ACA ACT GCT ACT GGA 1540 
Lys Asp Val Thr Val Thr Thr Thr Gin Ser Arg Thr Thr Ala Thr Gly 
410 415 420 425 

GAA ATT GAA ATC CAG CCA AGA GTG CTT OCT AAT GCA TCA TTC TTC AAC 1588 
15 Glu lie Glu lie Gin Pro Arg Val Val Pro Asn Ala Ser Phe Phe Asn 

430 435 440 

TTC TTT AGT OCT OCT GAG ATT OCT ATG ATT GGG AAG CTG GAA OCA GGA 1636 
Phe Phe Ser Pro Pro Glu lie Pro Met lie Gly Lys Leu Glu Pro Arg 
20 445 450 455 

GAA GAT GCT ATC CTG GAT GAG GAC TTT GAA ATT GGG CAG ATT TTA CAT 1684 
Glu Asp Ala lie Leu Asp Glu Asp Phe Glu lie Gly Gin lie Leu His 
460 465 470 

25 

GAT AAT GTC ATC CTG AAA TCA ATC TAT TAG TAT ACT GGA GAA CTC AAT 1732 
Asp Asn Val lie Lai Lys Ser lie Tyr Tyr Tyr Thr Gly Glu Val Asn 
475 480 485 

30 GGT ACC TAG TAT CAA TTT GGC AAA CAT TAT GGA AAC AAG AAA TAC AGA 1780 
Gly Thr Tyr Tyr Gin Phe Gly Lys His Tyr Gly Asn Lys Lys Tyr Arg 
490 495 500 505 

AAA TAAGTCAATC TGAAAGATTT TTCAAGAATC TTAAAATCTC AAGAACTGAA 1833 

35 Lys 

GCAGATTCAT ACAGCCTTGA AAAAAGTAAA A00CTGA0CT GTAAOCTGAA CACTATTATT 1893 
OCTTATAGTC AAGTTTTTCT GGTTTCTTGG TACTCTATAT TTTAAAAATA GT0CTAAAAA 1953 

40 

GTGTCTAAGT GOCAGTTTAT TCTATCTAGG CTOITGTAGT ATAATATTCT TCAAAATATG 2013 

TAAGCTGTTG TCAATTATCT AAAGCATGTT AGTTTGCTGC TACACAGTGT TGATTTTPGT 2073 

45 GATCTOCTTr GGTCATGTTT CTGTTAGACT GTAGCTCTGA AACTGTCAGA ATTGTTAACT 2133 

GAAACAAATA TTTOCTTGAA AAAAAAAGTT CATGAAGTAC CAATGCAACT CTTTTATTTT 2193 

TTTTCTTTTT T0CAG00CAT AAGACTAAGG GTTTAAATCT GCTTGCACTA GCTGTGOCTT 2253 

CATTAGTTTG CTATAGAAAT CCAGTACTTA TAGTAAATAA AACAGTGTAT TTTGAAGTTT 2313 
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GACTGCTTGA AAAAGATTAG CATACATCTA ATGTGAAAAG AOCACATTTG ATTCAACTGA 2373 

GAOCTTGTGT ATGTGACATA TAGTGGOCTA TAAATTTAAT CATAATGATG TTATTGTTTA 2433 

5 

OCACTGAGCT CTTAATATAA CATAGTATTT TTGAAAAAGTT TTCTTCATCT TATATTGTGT 2493 

AATTCTAAAC TAAAGATACC GTGTTTTCTT TCTATTGTGT TCTAOCTTCX: CTTTCACTGA 2553 

10 AAATGATCAC TTCATTTGAT ACTGTrTTTC ATGTTCTTGT ATTGCAAOCT AAAATAAATA 2613 

AATATTAAAG TGTOTTATAC TAT 2636 

15 (2) INFORMATION FOR SEQ ID NO: 22: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 170 amino acids 

(B) TYPE: amino acid 
20 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22: 

25 

Met Thr Glu Leu Gin Ser Ala Leu Leu Leu Arg Arg Gin Leu Ala Glu 
15 10 15 

Leu Asn Lys Asn Pro Val Glu Gly Rie Ser Ala Gly Leu lie Asp Asp 

30 20 25 30 

Asn Asp Leu Tyr Arg Tip Glu Val Leu He He Gly Pro Pro Asp Thr 
35 40 45 

35 Lai Tyr Glu Gly Gly Val Phe Lys Ala His Leu Thr Fhe Pro Lys Asp 
50 55 60 

Tyr Pro Leu Arg Pro Pro Lys Met Lys Phe lie Thr Glu He Trp His 
65 70 75 80 

40 

Pro Asn Val Asp Lys Asn Gly Asp Val Cys He Ser He Lai His Glu 
85 90 95 

Pro Gly Glu Asp Lys Tyr Gly Tyr Glu Lys Pro Glu Glu Arg Trp Leu 
45 100 105 110 

Pro He His Thr Val Glu Thr He Met He Ser Val He Ser Met Leu 
115 120 125 

so Ala Asp Pro Asn Gly Asp Ser Pro Ala Asn Val Asp Ala Ala Lys Glu 
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15 



20 



25 



30 



35 



40 



130 135 140 

Tip Arg Glu Asp Arg Asn Gly Glu Phe Lys Arg Lys Val Ala Arg Cys 
145 150 155 160 

Val Arg Lys Ser Gin Glu Thr Ala Phe Glu 
165 170 



(2) INFORMATION FOR SEQ ID NO: 23: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 510 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA( genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23: 

ATGAOGGAGC TGCAGTOGGC ACTGCTACTG CGAAGACAGC TGGCAGAACT CAACAAAAAT 60 

CCAGTGGAAG GCTTTTCTGC AGGTTTAATA GATGACAATG ATCTCTACOG ATGGGAAGTC 120 

CTTATTAOTG G00CT0CAGA TACACTTTAT GAAGCTGGTG TTTTTAAQQC TCATCTTACT 180 

TTOOCAAAAG ATTATCOOCT CX3GAOCT0CT AAAATGAAAT TCATTACAGA AATCTGGCAC 240 

CCAAATGTTG ATAAAAATGG TGATGTGTGC ATTTCTATTC TTCATGAGOC TGGGGAAGAT 300 

AAGTATOGTT ATGAAAAGOC AGAGGAAOGC TGGCTOOCTA TOCACACTGT GGAAACCATC 360 

ATGATTACTG TCATTTCTAT GCTGGCAGAC CCTAATGGAG ACTCADCTGC TAATGTTGAT 420 

GCTGOGAAAG AATGGAGGGA AGATAGAAAT GGAGAATTTA AAAGAAAACT TGOCXXCTOT 480 

GTAAGAAAAA GCCAAGAGAC TGCITTTGAG 510 

(2) INFORMATION FOR SEQ ID NO: 24: 

(i) SEQUENCE CHARACTERISTICS: 
45 (A) LENGTH: 617 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

so (ii) MOLECULE TYPE: DNA( genomic ) 
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(iii) HYPOTHETICAL: NO 
(iv) ANTI -SENSE: NO 

5 

(vii) IMMEDIATE SOURCE: 

(A) LIBRARY: Human fetal brain cDNA library 

(B) CLONE: GEN-423A12 

10 (ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 19.. 528 



15 



20 



25 



30 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24: 

GGGOOCTCGG CAGGGAGG ATG ACG GAG CTG CAG TOG GCA CTG CTA CTG CGA 51 

Met Thr Glu Leu Gin Ser Ala Leu Leu Leu Arg 
15 10 

AGA CAG CTG GCA GAA CTC AAC AAA AAT CCA CTG GAA GGC TTT TCT GCA 99 
Arg Gin Leu Ala Glu Leu Asn Lys Asn Pro Val Glu Gly Fhe Ser Ala 
15 20 25 

GGT TTA ATA GAT GAC AAT GAT CTC TAC CGA TGG GAA GTC CTT ATT ATT 147 
Gly Leu lie Asp Asp Asn Asp Leu Tyr Arg Trp Glu Val Leu lie lie 
30 35 40 

GGC OCT CCA GAT ACA CTT TAT GAA GGT GOT GTT TTT AAG GOT CAT CTT 195 
Gly Pro Pro Asp Thr Leu Tyr Glu Gly Gly Val Fhe Lys Ala His Leu 
45 50 55 

ACT TTC CCA AAA GAT TAT OOC CTC CGA OCT OCT AAA ATG AAA TTC ATT 243 
Thr Phe Pro Lys Asp Tyr Pro Leu Arg Pro Pro Lys Met Lys Fhe lie 
60 65 70 75 

ACA GAA ATC TGG CAC CCA AAT GTT GAT AAA AAT GOT GAT GTG TGC ATT 291 
Thr Glu lie Trp His Pro Asn Val Asp Lys Asn Gly Asp Val Cys lie 
80 85 90 

TCT ATT CTT CAT GAG OCT GGG GAA GAT AAG TAT GGT TAT GAA AAG CCA 339 
Ser lie Leu His Glu Pro Gly Glu Asp Lys Tyr Gly Tyr Glu Lys Pro 
95 100 105 

GAG GAA CGC TGG CTC OCT ATC CAC ACT GTG GAA ACC ATC ATG ATT ACT 387 
45 Glu Glu Arg Trp Leu Pro He His Thr Val Glu Thr He Met He Ser 
110 115 120 

GTC ATT TCT ATG CTG GCA GAC OCT AAT GGA GAC TCA OCT GCT AAT GTT 435 
Val He Ser Met Leu Ala Asp Pro Asn Gly Asp Ser Pro Ala Asn Val 
» 125 130 135 
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GAT GCT GOG AAA GAA TOG AGG GAA GAT AGA AAT GGA GAA TTT AAA AGA 483 
Asp Ala Ala Lys Glu Tip Arg Glu Asp Arg Asn Gly Glu Phe Lys Arg 
140 145 ^ 150 155 

AAA GTT GCC OGC TGT GTA AGA AAA AGC CAA GAG ACT GCT TTT GAG 528 
Lys Val Ala Arg Cys Val Arg Lys Ser Gin Glu Thr Ala Phe Glu 
160 165 170 

TGACATTTAT TTAGCAGCTA GTAACTTCAC TTATTTCAGG GTCTCCAA1T GAGAAACATG 588 

GCACTGTTTT TOCTGCACTC TAD0CAO0G 617 



15 (2) INFORMATION FOR SBQ ID NO: 25: 

(±) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 374 amino acids 

(B) TYPE: amino acid 
20 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SBQ ID NO: 25: 

25 

Met Val Leu Trp Glu Ser Pro Arg Gin Cys Ser Ser Trp Thr Leu Cys 
1 5 10 15 

Glu Gly Phe Cys Trp Leu Leu Leu Leu Pro Val Met Leu Leu lie Val 
20 25 30 

Ala Arg Pro Val Lys Leu Ala Ala Phe Pro Thr Ser Leu Ser Asp Cys 
35 40 45 

Gin Thr Pro Thr Gly Trp Asn Cys Ser Gly Tyr Asp Asp Arg Glu Asn 
50 55 60 

Asp Leu Phe Leu Cys Asp Thr Asn Thr Cys Lys Phe Asp Gly Glu Cys 
65 70 75 80 

Leu Arg lie Gly Asp Thr Val Thr Cys Val Cys Gin Phe Lys Cys Asn 
85 90 95 

Asn Asp Tyr Val Pro Val Cys Gly Ser Asn Gly Glu Ser Tyr Gin Asn 
* 100 105 110 

Glu Cys Tyr Leu Arg Gin Ala Ala Cys Lys Gin Gin Ser Glu lie Leu 
115 120 125 

so val Val Ser Glu Gly Ser Cys Ala Thr Asp Ala Gly Ser Gly Ser Gly 



55 



30 



35 



40 



67 



EP 0 796 913 A2 



130 135 140 

Asp Gly Val His Glu Gly Ser Gly Glu Thr Ser Gin Lys Glu Thr Ser 
145 150 155 160 

Thr Cys Asp He Cys Gin Phe Gly Ala Glu Cys Asp Glu Asp Ala Glu 
165 170 175 

Asp Val Trp Cys Val Cys Asn lie Asp Cys Ser Gin Thr Asn Phe Asn 
180 185 190 

Pro Leu Cys Ala Ser Asp Gly Lys Ser Tyr Asp Asn Ala Cys Gin He 
195 200 205 

Lys Glu Ala Ser Cys Gin Lys Gin Glu Lys lie Glu Val Met Ser Leu 
210 215 220 

Gly Arg Cys Gin Asp Asn Thr Thr Thr Thr Thr Lys Ser Glu Asp Gly 
225 230 235 240 

His Tyr Ala Arg Thr Asp Tyr Ala Glu Asn Ala Asn Lys Leu Glu Glu 
245 250 255 

Ser Ala Arg Glu His His lie Pro Cys Pro Glu His Tyr Asn Gly Phe 
260 265 270 

Cys Met His Gly Lys Cys Glu His Ser He Asn Met Gin Glu Pro Ser 
275 280 285 

Cys Arg Cys Asp Ala Gly Tyr Thr Gly Gin His Cys Glu Lys Lys Asp 
290 295 300 

Tyr Ser Val Leu Tyr Val Val Pro Gly Pro Val Arg Phe Gin Tyr Val 
305 310 315 320 

Leu He Ala Ala Val He Gly Thr He Gin He Ala Val He Cys Val 

325 330 335 

Val Val Leu Cys He Thr Arg Lys Cys Pro Arg Ser Asn Arg lie His 
340 345 350 

Arg Gin Lys Gin Asn Thr Gly His Tyr Ser Ser Asp Asn Thr Thr Arg 
355 360 365 

Ala Ser Thr Arg Leu He 
370 



(2) INFORMATION FOR SEQ ID NO: 26: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1122 base pairs 

(B) TYPE: nucleic acid 

5 (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA( genomic) 

10 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:26: 

ATGCTQCTGT GGGAGTCOCC GOGGCAGTGC AGCAGCTGGA CACTTTGOGA GQQCTTTTGC 60 

is TGGCTGCTGC TGCTGOOCGT CATGCTACTC AT0GTAG00C GOCOGGTGAA GCTOGCTQCT 120 

TTOOCTAOCT CCTTAAGTGA CTGOCAAAOG COCACOGGCT GGAATTGCTC TGGTTATGAT 180 

GACAGAGAAA ATGATCTCTT CCTCTCTGAC AOCAACAGCT CTAAATTTGA TGGGGAATGT 240 

20 

TTAAGAATTG GAGACACTGT GACTTGCCTC TGTCAGTTCA AGTOCAACAA TGACTATGTG 300 

(XTCTGTGTG GCTOCAATGG GGAGAGCTAC CAGAATGACT GTTAOCTGOG ACAGGCTGCA 360 

25 TGCAAACAGC AGAGTGAGAT ACTTCTGGTG TCAGAAGGAT CATGTGOCAC AGATGCAGGA 420 

TCAGGATCTG GAGATGGAGT CCATGAAGGC TCTGGAGAAA CTAGTCAAAA GGAGACATOC 480 

ACCTCTGATA TTTGOCAGTT TGGPGCAGAA TGTGAOGAAG ATGOOGAGGA TGTCTGGTGT 540 

GTGTOTAATA TTGACTCTTC TCAAAOCAAC TTCAATOOOC TCTGOGCTTC TGATGGGAAA 600 

TCTTATGATA ATGCATGOCA AATCAAAGAA GCATOGTGTC AGAAACAGGA GAAAATTGAA 660 

OTCATOTCTT TQGGTOGATG TCAAGATAAC ACAACTACAA CTACTAAGTC TGAAGATGGG 720 

CATTATGCAA GAACAGATTA TGCAGAGAAT GCTAACAAAT TAGAAGAAAG TGOCAGAGAA 780 

CAOCACATAC CTTCTCOQGA ACATTACAAT GGCTTCTGCA TGCATGGGAA GTOTGAGCAT 840 

TCTATCAATA TGCAGGAGOC ATCTTQCAGG TGTGATQCTG GTTATACTGG ACAACACTOT 900 

GAAAAAAAGG ACTACAGTGT TCTATADGTT GTTC000G1C CTGTAOGATT TCACTATGTC 960 

TTAATOGCAG CTGTGATTGG AACAATPCAG ATTGCTOTCA TCTCTCTGGT GGTOCTCTGC 1020 

ATCACAAGGA AATGCOOCAG AAGCAACAGA ATTCACAGAC AGAAGCAAAA TACAGGGCAC 1080 

TACAGTTCAG ACAATACAAC AAGAGOGTOC AOGAGGTTAA TC 1122 

50 
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(2) INFORMATION FOR SEQ ID NO: 27: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1721 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA( genomic) 
(iii) HYPOTHETICAL: NO 
(iv) ANTI -SENSE: NO 

(vii) IMMEDIATE SOURCE: 

(A) LIBRARY: Human fetal brain cDNA library 

(B) CLONE: GEN-092E10 

20 (ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 368. .1489 



15 



25 



30 



35 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27: 

CTQOQQQQOG OCTTGACTCT OOCTOCAOOC TOOCTOCTOG QGCTOGACTC GTCTQOOOCT 60 

GGACT000CT CTCCTOCTOr OCTOCGQCTT COCAGAGCTC CCTOCTTATG GCAGCAGCTT 120 

COCQOGTCTC CX3G0GC^GJCT TCTCAGOGGA CGACXXTCTC GCICOGQGGC TGAGOCAGTC 180 

CCTGGATCTT GCTGAAACTC TOGAGATCAT GOGOGGCTTT GGCTGCTQCT TOOQOGOOGG 240 

GTGOCACTGC CACOGOOGOC GCCTCTGCTG COGOCCTOOG CGQGATGCTC AGTAGOOOOC 300 

TGOOOGGOOC COGCGATCCT GTGTTOCTOG GAAGOOGTTT GCTGCTGCAG AGTTQCAOGA 360 

ACTAOTC ATG CTG CTG TGG GAG TOC COG CGG CAG TGC AGC AGC TGG ACA 409 
Met Val Leu Trp Glu Ser Pro Arg Gin Cys Ser Ser Trp Ttir 
1 5 10 

CTT TGC GAG GGC TTT TGC TGG CTG CTG CTG CTG COC GTC ATG CTA CTC 457 
Leu Cys Glu Gly Phe Cys Trp Lai Leu Leu Leu Pro Val Met Leu Leu 
15 20 25 30 

ATC GTA GOC CGC COG GTG AAG CTC GCT OCT TTC OCT ADC TCC TTA ACT 505 
lie Val Ala Arg Pro Val Lys Leu Ala Ala Phe Pro Thr Ser Leu Ser 
35 40 45 

50 GAC TGC CAA ACG CCC ACC GGC TGG AAT TGC TCT GGT TAT GAT GAC AGA 553 



55 



40 



45 



70 



EP0 796 913 A2 



Asp Cys Gin Thr Pro Thr Gly Trp Asn Cys Ser Gly Tyr Asp Asp Arg 
50 55 60 

GAA AAT GAT CTC TTC CTC TGT GAC AOC AAC ADC TGT AAA TIT GAT GOG 
Glu Asn Asp Leu Phe Leu Cys Asp Thr Asn Thr Cys Lys Phe Asp Gly 
65 70 75 

GAA TGT TTA AGA ATT GGA GAC ACT GTG ACT TGC GTC TCT CAG TIC AAG 
Glu Cys Leu Arg lie Gly Asp Thr Val Thr Cys Val Cys Gin Phe Lys 
80 85 90 

TGC AAC AAT GAC TAT GTG OCT CTG TGT GGC TOC AAT GGG GAG AGC TAC 
Cys Asn Asn Asp Tyr Val Pro Val Cys Gly Ser Asn Gly Glu Ser Tyr 
95 100 105 110 

CAG AAT GAG TGT TAC CTG CGA CAG GOT GCA TGC AAA CAG CAG ACT GAG 
Gin Asn Glu Cys Tyr Leu Arg Gin Ala Ala Cys Lys Gin Gin Ser Glu 
115 120 125 

ATA CTT GTG GTG TCA GAA GGA TCA TGT GOC ACA GAT GCA GGA TCA GGA 
lie Leu Val Val Ser Glu Gly Ser Cys Ala Thr Asp Ala Gly Ser Gly 
130 135 140 

TCP GGA GAT GGA GTC CAT GAA GGC TCT GGA GAA ACT AGT CAA AAG GAG 
Ser Gly Asp Gly Val His Glu Gly Ser Gly Glu Thr Ser Gin Lys Glu 
145 150 155 

ACA TOC AOC TGT GAT ATT TGC CAG TTT GOT GCA GAA TGT GAC GAA GAT 
Thr Ser Thr Cys Asp lie Cys Gin Phe Gly Ala Glu Cys Asp Glu Asp 
160 165 170 

GOC GAG GAT GTC TGG TOT GTG TGT AAT ATT GAC TGT TCT CAA ACC AAC 
Ala Glu Asp Val Trp Cys Val Cys Asn lie Asp Cys Ser Gin Thr Asn 
175 180 185 190 

TTC AAT CCC CTC TGC OCT TCT GAT GGG AAA TCT TAT GAT AAT GCA TGC 
Phe Asn Pro Leu Cys Ala Ser Asp Gly Lys Ser Tyr Asp Asn Ala Cys 
195 200 205 

CAA ATC AAA GAA GCA TOG TCT CAG AAA CAG GAG AAA ATT GAA CTC ATG 
Gin lie Lys Glu Ala Ser Cys Gin Lys Gin Glu Lys lie Glu Val Met 
210 215 220 

TCT TIG GOT CGA TGT CAA GAT AAC ACA ACT ACA ACT ACT AAG TCT GAA 
Ser Leu Gly Arg Cys Gin Asp Asn Thr Thr Thr Thr Thr Lys Ser Glu 
225 230 235 

GAT GGG CAT TAT GCA AGA ACA GAT TAT GCA GAG AAT GCT AAC AAA TTA 
Asp Gly His Tyr Ala Arg Thr Asp Tyr Ala Glu Asn Ala Asn Lys Leu 
240 245 250 
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GAA GAA ACT GCC AGA GAA CAC CAC ATA OCT TGT COG GAA CAT TAC AAT 1177 
Glu Glu Ser Ala Arg Glu His His lie Pro Cys Pro Glu His Tyr Asn 
255 260 265 270 

5 

GGC TTC TGC ATG CAT GGG AAG TGT GAG CAT TCT ATC AAT ATG CAG GAG 1225 
Gly Fhe Cys Met His Gly Lys Cys Glu His Ser He Asn Met Gin Glu 
275 280 285 

™ CCA TCT TGC AGG TGT GAT GCT GGT TAT ACT GGA CAA CAC TCT GAA AAA 1273 
Pro Ser Cys Arg Cys Asp Ala Gly Tyr Thr Gly Gin His Cys Glu Lys 
290 295 300 

AAG GAC TAC ACT CTT CTA TAC CTT GTT OOC GGT OCT GTA OGA TTT CAG 1321 
is Lys Asp Tyr Ser Val Leu Tyr Val Val Pro Gly Pro Val Arg Phe Gin 
305 310 315 

TAT GTC TTA ATC GCA GCT GIG ATT GGA ACA ATT CAG ATT GCT CTC ATC 1369 
Tyr Val Leu He Ala Ala Val He Gly Thr He Gin He Ala Val He 
320 325 330 



20 



25 



30 



TCT GTG CTG CTC CTC TGC ATC ACA AGG AAA TGC CCC AGA AGC AAC AGA 1417 
Cys Val Val Val Leu Cys lie Thr Arg Lys Cys Pro Arg Ser Asn Arg 
335 340 345 350 

ATT CAC AGA CAG AAG CAA AAT ACA GGG CAC TAC ACT TCA GAC AAT ACA 1465 
He His Arg Gin Lys Gin Asn Thr Gly His Tyr Ser Ser Asp Asn Thr 

355 360 ~ 365 

ACA AGA GOG TOC AOG AGG TTA ATC TAA AGGGAGCATG TTFCACAGTC 1512 
Thr Arg Ala Ser Thr Arg Leu He 
370 



GCTGGACTAC OGAGAGCTTG GACTACACAA TACACTATTA TAGACAAAAG AATAAGACAA 1572 

35 GAGATCTACA CATGTTGOCT TGCATTTGTG GTAATCTACA OCAATGAAAA CATGTACTAC 1632 

AGCTATATTT GATTATCTAT GGATATATTT GAAATACTAT ACATTCTCTT GATGTTTTTT 1692 

CTGTAATGTA AATAAACTAT TTATATCAC 1721 

40 

(2) INFORMATION FOR SBQ ID NO: 28: 

(i) SEQUENCE CHARACTERISTICS: 
45 (A) LENGTH: 817 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: protein 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:28: 



Met Gly Asp Thr Val Val Glu Pro Ala Pro Leu Lys Pro Thr Ser Glu 
1 5 10 15 

Pro Thr Ser Gly Pro Pro Gly Asn Asn Gly Gly Ser Leu Leu Ser Val 
20 25 30 

He Thr Glu Gly Val Gly Glu Leu Ser Val He Asp Pro Glu Val Ala 
35 40 45 

Gin Lys Ala Cys Gin Glu Val Lai Glu Lys Val Lys Leu Leu His Gly 
50 55 60 

Gly Val Ala Val Ser Ser Arg Gly Thr Pro Leu Glu Leu Val Asn Gly 
65 70 75 80 

Asp Gly Val Asp Ser Glu He Arg Cys Leu Asp Asp Pro Pro Ala Gin 
85 90 95 

lie Arg Glu Glu Glu Asp Glu Met Gly Ala Ala Val Ala Ser Gly Thr 
100 105 110 

Ala Lys Gly Ala Arg Arg Arg Arg Gin Asn Asn Ser Ala Lys Gin Ser 
115 120 125 

Trp Leu Leu Arg Leu Pbe Glu Ser Lys Leu Phe Asp He Ser Met Ala 
130 135 140 

He Ser Tyr Leu Tyr Asn Ser Lys Glu Pro Gly Val Gin Ala Tyr He 
145 150 155 160 

Gly Asn Arg Leu Phe Cys Phe Arg Asn Glu Asp Val Asp Phe Tyr Leu 
165 170 175 

Pro Gin Leu Leu Asn Met Tyr He His Met Asp Glu Asp Val Gly Asp 
180 185 190 

Ala lie Lys Pro Tyr He Val His Arg Cys Arg Gin Ser He Asn Phe 
195 200 205 

Ser Leu Gin Cys Ala Leu Leu Leu Gly Ala Tyr Ser Ser Asp Met His 
210 215 220 

He Ser Thr Gin Arg His Ser Arg Gly Thr Lys Leu Arg Lys Leu He 
225 230 235 240 

Leu Ser Asp Glu Leu Lys Pro Ala His Arg Lys Arg Glu Leu Pro Ser 
245 250 255 
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Leu Ser Pro Ala Pro Asp Thr Gly Leu Ser Pro Ser Lys Arg Thr His 
260 265 270 

Gin Arg Ser Lys Ser Asp Ala Thr Ala Ser lie Ser Leu Ser Ser Asn 
275 280 285 

Leu Lys Arg Thr Ala Ser Asn Pro Lys Val Glu Asn Glu Asp Glu Glu 
290 295 300 

Leu Ser Ser Ser Thr Glu Ser lie Asp Asn Ser Phe Ser Ser Pro Val 
305 310 315 320 

Arg Leu Ala Pro Glu Arg Glu Phe lie Lys Ser Leu Met Ala lie Gly 
325 330 335 

Lys Arg Leu Ala Thr Leu Pro Thr Lys Glu Gin Lys Thr Gin Arg Leu 
340 345 350 

lie Ser Glu Leu Ser Leu Leu Asn His Lys Leu Pro Ala Arg Val Trp 
355 360 365 

Leu Pro Thr Ala Gly Phe Asp His His Val Val Arg Val Pro His Thr 
370 375 380 

Gin Ala Val Val Leu Asn Ser Lys Asp Lys Ala Pro Tyr Leu lie Tyr 
385 390 395 400 

Val Glu Val Leu Glu Cys Glu Asn Phe Asp Thr Thr Ser Val Pro Ala 
405 " 410 415 

Arg lie Pro Glu Asn Arg lie Arg Ser Thr Arg Ser Val Glu Asn Leu 
420 ™ 425 ~ 430 

Pro Glu Cys Gly lie Thr His Glu Gin Arg Ala Gly Ser Phe Ser Thr 
435 440 445 

Val Pro Asn Tyr Asp Asn Asp Asp Glu Ala Trp Ser Val Asp Asp lie 
450 455 460 

Gly Glu Leu Gin Val Glu Leu Pro Glu Val His Thr Asn Ser Cys Asp 
465 470 475 480 

Asn He Ser Gin Phe Ser Val Asp Ser He Thr Ser Gin Glu Ser Lys 
485 490 495 

Glu Pro Val Hie He Ala Ala Gly Asp He Arg Arg Arg Leu Ser Glu 
500 505 510 



Gin Leu Ala His Thr Pro Thr Ala Phe Lys Arg Asp Pro Glu Asp Pro 
515 520 ~ 525 
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Ser Ala Val Ala Leu Lys Glu Pro Trp Gin Glu Lys Val Arg Arg He 
530 535 540 

Arg Glu Gly Ser Pro Tyr Gly His Leu Pro Asn Trp Arg Leu Leu Ser 
545 550 555 560 

Val He Val Lys Cys Gly Asp Asp Leu Arg Gin Glu Leu Leu Ala Phe 
565 570 575 

Gin Val Leu Lys Gin Leu Gin Ser He Trp Glu Gin Glu Arg Val Pro 
580 585 590 

Leu Trp He Lys Pro He Gin Asp Ser Cys Glu He Thr Thr Asp Ser 
595 600 605 

Gly Met He Glu Pro Val Val Asn Ala Val Ser He His Gin Val Lys 
610 615 620 

Lys Gin Ser Gin Leu Ser Leu Leu Asp Tyr Phe Leu Gin Glu His Gly 
625 630 ~ 635 640 

Ser Tyr Thr Thr Glu Ala Phe Leu Ser Ala Gin Arg Asn Phe Val Gin 
645 650 ~ 655 

Ser Cys Ala Gly Tyr Cys Leu Val Cys Tyr Leu Leu Gin Val Lys Asp 
660 665 670 

Arg His Asn Gly Asn He Leu Leu Asp Ala Glu Gly His He He His 
675 680 685 

He Asp Phe Gly Phe He Leu Ser Ser Ser Pro Arg Asn Leu Gly Phe 
690 695 700 

Glu Thr Ser Ala Phe Lys Leu Thr Thr Glu Phe Val Asp Val Met Gly 
705 710 715 720 

Gly Leu Asp Gly Asp Met Phe Asn Tyr Tyr Lys Met Leu Met Leu Gin 
725 730 735 

Gly Leu He Ala Ala Arg Lys His Met Asp Lys Val Val Gin He Val 
740 745 750 

Glu He Met Gin Gin Gly Ser Gin Leu Pro Cys Phe His Gly Ser Ser 
755 * 760 765 

Thr He Arg Asn Leu Lys Glu Arg Phe His Met Ser Met Thr Glu Glu 
770 775 " 780 



Gin Leu Gin Leu Leu Val Glu Gin Met Val Asp Gly Ser Met Arg Ser 
785 790 795 " 800 
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10 



15 



20 



30 



35 



40 



45 



50 



55 



lie Thr Thr Lys Leu Tyr Asp Gly Phe Gin Tyr Leu Thr Asn Gly lie 
805 810 815 

Met 



(2) INFORMATION FOR SEQ ID NO: 29: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2451 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA( genomic ) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 29: 



ATGGGAGATA CAGTAGTGGA GOCTGOCCOC TTGAAGOCAA CTTCTGAGCC CACTTCTGGC 60 

OCAOCAGGGA ATAATGGGGG GTOOCTGCTA AGTGTCATCA CGGAGGGGGT CGGGGAACTA 120 

25 TCAGTGATTG ACOCTGAGGT GGCCCAGAAG GCCTGOCAGG AGCTGTTOGA GAAAGTCAAG 180 

CTTTTGCATG GAGGCGTGGC AGTCTCTAGC AGAGGCACCC CACTQGAGTT GGTCAATGGG 240 

GATGGTOTGG ACAGIGAGAT CCGTTGCCTA GATGATCCAC CTGCCCAGAT CAGGGAGGAG 300 

GAAGATGAGA TGGGGGCCGC TGTGGOCTCA GGCACAGOCA AAGGAGCAAG AAGACGGOGG 360 

CAGAACAACT CAGCTAAACA CTCTTGGCTG CTGAGGCTGT TTGAGTCAAA ACTGTTTGAC 420 

ATCTCCATGG CCATTTCATA OCTGTATAAC TCCAAGGAGC CTGGAGTACA AGCCTACATT 480 

GGCAACCGGC TCTTCTGCTT TOGCAADGAG GACGTGGACT TCTATCTGCC CCAGTFGCTT 540 

AACATGTACA TCCACATGGA TGAGGACGTG GGTGATGCCA TTAAGCOCTA CATAGTCCAC 600 

CGTrGCCGCC AGAGCATTAA CTTTTOOCTC CAGTGTGCCC TCTTOCTTGG GGCCTATTCT 660 

TCAGACATGC ACATTTCCAC TCAACGACAC TCCCGTGGGA CCAAGCTACG GAAGCTGATC 720 

CTCTCAGATG AGCTAAAGCC AGCTCACAGG AAGAGGGAGC TGCXXTCCTT GAGOOCGGOC 780 

CCTGATACAG GGCTGTCTCC CTCCAAAAGG ACTCACCAGC GCTCTAAGTC AGATGCCACT 840 

GOCAGCATAA CTCTCAGCAG CAACCTGAAA CGAACAGCCA GCAACCCTAA AGTGGAGAAT 900 

GAGGATGAGG AGCTCTCCTC CAGCACCGAG AGTATTGATA ATTCATTCAG TTCCCCTGTT 960 
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OGACTGGCTC CTGAGAGAGA ATTCATCAAG 
AOGCTOOCCA CCAAAGAGCA GAAAACACAG 

5 

CATAAGCTOC CTGCOCGAGT CTGGCTGOOC 
GTAOCCCACA CACAGGCTGT TGTCCTCAAC 
/0 CTGGAAGTCC TTGAATGTGA AAACTTTGAC 
AACOGAATTC GGAGTAOGAG CTOCGTAGAA 
CAGOGAGCTG GCAGCTTCAG CACTGTGCCC 

T5 

GTGGATGACA TAGGCGAGCT GCAAGTGGAG 
AACATCTOCC AGTTCTCTGT GGACAGCATC 

20 ATTGCAGCAG GGGACATCCG CCGGCGOCTT 
TTCAAACGAG ACCCAGAAGA TOCTTCTGCA 
GTAOGGOGGA TCAGAGAGGG CTOOOOCTAC 
GTCATTGTCA AGTGTGQGGA TGACCTTCGG 
CAACTGCAGT CCATTTGGGA ACAGGAGCGA 

30 TCTTGTGAAA TTACGACTGA TAGTGGCATG 

CATCAGGTGA AGAAACAGTC ACAGCTCTCC 
AGTTACACCA CTGAGGCATT CCTCAGTGCA 

35 

TACTOCTTGG TCTGCTAOCT GCTGCAAGTC 
GAOGCAGAAG GOCACATCAT CCACATOGAC 
AATCTGGGCT TTCAGACGTC AGOCTTTAAG 
GGOCTGGATG GOGACATGTT CAACTACTAT 
GCTOGGAAAC ACATGGACAA GGTGGTGCAG 
45 CITCCTTGCT TCCATGGCTC CAGCACCATT 

ATGACTGAGG AGCAGCTGCA GCTGCTGGTG 
ATCACCACCA AACTCTATGA OGGCTTCCAG 

50 



TCCCTGATGG 


OGATOGGCAA 


GCGGCTGGCC 


1020 


AGGCTGATCT 


CAGAGCTCTC 


OCTGCTCAAC 


1080 


ACTGCTGGCT 


TTGACCACCA 


CGTGGTCOGT 


1140 


TOCAAGGACA 


AGGCTCCCTA 


CCTGATTTAT 


1200 


AGCACCAGTG 


TCCCTGCOCG 


GATCCCCGAG 


1260 


AACTTGOOOG 


AATGTGGTAT 


TACOCATGAG 


1320 


AACTATGACA 


AOGATGATGA 


GGOCTGGTCG 


1380 


CTOOOCGAAG 


TGCATACCAA 


CAGCTGTGAC 


1440 


AOCAGOCAGG 


AGAGCAAGGA 




1500 


TOGGAACAGC 


TGGCTCATAC 


OOOGACAGOC 


1560 


GTTGCTCTCA 


AAGAGCCCTG 


GCAGGAGAAA 


1620 


GGCX^TCTOC 


OCAATTGGOG 


GCTOCTGTCA 


1680 


CAAGAGCTTC 


TGGOCTITCA 


GCTGITGAAG 


1740 


GTGOOOCTTT 


GGATCAAGOC 


AATACAAGAT 


1800 


ATOGAAOCAG 


TGGTCAATGC 


TGTGTCCATC 


1860 


TTGCTCGATT 


ACTTOCTACA 


GGAGCACGGC 


1920 


CAGOGCAATT 


TTGTGCAAAG 


TTGTGCTGGG 


1980 


AAGGACAGAC 


ACAATGGGAA 


TATCCTTTTG 


2040 


TTTGGCTTCA 


TOCTCTCCAG 


CTCADCCOGA 


2100 


CTGACX^CAG 


AGTTTGTGGA 


TGTGATGGGC 


2160 


AAGATGCTGA 


TGCTGCAAGG 


GCTGATTGOC 


2220 


ATOGTGGAGA 


TCATGCAGCA 


AGGTTCTCAG 


2280 


CGAAAOCTCA 


AAGAGAGGTT 


CCACATGAGC 


2340 


GAGCAGATGG 


TGGATGGCAG 


TATGOGGTCT 


2400 


TACCTCAOCA 


AOGGCATCAT 


G 


2451 
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(2) INFORMATION FOR SEQ ID NO: 30: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3602 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA( genomic) 
(iii) HYPOTHETICAL: NO 
(iv) ANTI -SENSE: NO 

(vii) IMMEDIATE SOURCE: 

(A) LIBRARY: Human fetal brain cDNA library 

(B) CLONE: GEN-428B12c2 

(ix) FEATURE: 

(A) NAME /KEY: CDS 

(B) LOCATION: 429.. 2879 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 30: 



GGTGGCTCAC GCCTGTAATC CCAGCACTTT GGGAGGACAA GGCAGATCCC TTGAGCCCAG 

GAGGTAGAGG CTGCAGTGAG CTGTGATGGT GCCACTGCAC TCCAGCCTGG GCAATGAAGC 

AAGACCCTAT CTGAAAAAAA AAATTTITAA AAAAGGCAAA GATGGGCCTG GGGCACCAAA 

TATTCCAGAG GAAAGGGAAC GTGTGTACTC CTTGAGGTGG GGAACATGAC CCACTTGAGG 

TGCAGAAAGA AGACTTGTAT GGGGCTGGTG CAGCCTCCGC GGCOGCTGTC AGGGAAGCGC 

AGGOGGOCAA TGGAACOOGG GAGOGGTCGC TGCTGCTGAG GCGGCAGTGT CGGCAGTCCA 

ACCGOGACTG CXTGCACCCC CTOOGOGGGG TCCCCCAGAG CTTGGAAGCT CGAAGTCTGG 

CTCTGGOC ATG GGA GAT ACA GTA GTG GAG OCT GCC CCC TTG AAG CCA ACT 
Met Gly Asp Thr Val Val Glu Pro Ala Pro Leu Lys Pro Thr 
15 10 

TCT GAG CCC ACT TCT GGC CCA CCA GGG AAT AAT GGG GGG TCC CTG CTA 
Ser Glu Pro Thr Ser Gly Pro Pro Gly Asn Asn Gly Gly Ser Leu Leu 
15 20 25 30 

AGT GTC ATC ACG GAG GGG GTC GGG GAA CTA TCA GTG ATT GAC OCT GAG 
Ser Val He Thr Glu Gly Val Gly Glu Leu Ser Val He Asp Pro Glu 
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35 



40 



45 



GTG GCC CAG AAG GCC TGC CAG GAG GTG TTG GAG AAA GTC AAG CTT TTG 
Val Ala Gin Lys Ala Cys Gin Glu Val Leu Glu Lys Val Lys Leu Leu 
50 55 ~ 60 



614 



10 



CAT GGA GGC GTG GCA GTC TCT AGC AGA GGC ACC OCA CTG GAG TTG GTC 
His Gly Gly Val Ala Val Ser Ser Arg Gly Thr Pro Leu Glu Leu Val 
65 70 ~ 75 



662 



15 



AAT GGG GAT GGT GTG GAC AGT GAG ATC CGT TGC CTA GAT GAT OCA OCT 710 
Asn Gly Asp Gly Val Asp Ser Glu lie Arg Cys Leu Asp Asp Pro Pro 
80 85 90 

GCC CAG ATC AGG GAG GAG GAA GAT GAG ATG GGG GCC GCT GTG GCC TCA 758 
Ala Gin He Arg Glu Glu Glu Asp Glu Met Gly Ala Ala Val Ala Ser 
95 100 105 110 



20 GGC ACA GCC AAA GGA GCA AGA AGA 0GG OGG CAG AAC AAC TCA GCT AAA 
Gly Thr Ala Lys Gly Ala Arg Arg Arg Arg Gin Asn Asn Ser Ala Lys 
115 ~ 120 125 



806 



CAG TCT TGG CTG CTG AGG CTG TIT GAG TCA AAA CTG TTT GAC ATC TCC 
25 Gin Ser Trp Leu Leu Arg Leu Phe Glu Ser Lys Leu Phe Asp He Ser 
130 135 140 



854 



30 



ATG GCC ATT TCA TAG CTG TAT AAC TOC AAG GAG OCT GGA GTA CAA GCC 
Met Ala He Ser Tyr Leu Tyr Asn Ser Lys Glu Pro Gly Val Gin Ala 
145 150 155 



902 



35 



TAC ATT GGC AAC OGG CTC TTC TGC TTT GGC AAC GAG GAC GTG GAC TTC 950 
Tyr He Gly Asn Arg Leu Phe Cys Phe Arg Asn Glu Asp Val Asp Phe 
160 165 ~ 170 

TAT CTG OCC CAG TTG CTT AAC ATG TAC ATC CAC ATG GAT GAG GAC GTG 998 
Tyr Leu Pro Gin Leu Leu Asn Met Tyr He His Met Asp Glu Asp Val 
175 180 185 190 



40 



GCT GAT GCC ATT AAG CCC TAC ATA GTC CAC OCT TGC CGC CAG AGC ATT 
Gly Asp Ala He Lys Pro Tyr He Val His Arg Cys Arg Gin Ser He 
195 200 ~ 205 



1046 



45 



50 



AAC TTT TCC CTC CAG TCT GCC CTG TTG CTT GGG G0C TAT TCT TCA GAC 1094 
Asn Phe Ser Leu Gin Cys Ala Leu Leu Leu Gly Ala Tyr Ser Ser Asp 
210 215 220 

ATG CAC ATT TCC ACT CAA OGA CAC TCC OCT GGG ACC AAG CTA OGG AAG 1142 
Met His He Ser Thr Gin Arg His Ser Arg Gly Thr Lys Leu Arg Lys 
225 230 235 



55 
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CTG ATC CTC TCA GAT GAG CTA AAG CCA GCT CAC AGG AAG AGG GAG CTG 
Leu He Leu Ser Asp Glu Leu Lys Pro Ala His Arg Lys Arg Glu Leu 
240 245 250 



1190 



OX TCC TTG AGC CCG GCC CCT GAT ACA GGG CTG TCT CCC TCC AAA AGG 
Pro Ser Leu Ser Pro Ala Pro Asp Thr Gly Leu Ser Pro Ser Lys Arg 
255 260 265 270 



1238 



10 ACT CAC CAG CGC TCT AAG TCA GAT GCC ACT GCC AGC ATA AGT CTC AGC 
Thr His Gin Arg Ser Lys Ser Asp Ala Thr Ala Ser He Ser Leu Ser 
275 280 285 



1286 



15 



AGC AAC CTG AAA CGA ACA GCC AGC AAC CCT AAA GTG GAG AAT GAG GAT 
Ser Asn Leu Lys Arg Thr Ala Ser Asn Pro Lys Val Glu Asn Glu Asp 
290 295 300 



1334 



20 



GAG GAG CTC TCC TCC AGC AGC GAG AGT ATT GAT AAT TCA TTC AGT TCC 
Glu Glu Leu Ser Ser Ser Thr Glu Ser He Asp Asn Ser Phe Ser Ser 
305 310 315 



1382 



25 



CCT GIT CGA CTG GCT CCT GAG AGA GAA TTC ATC AAG TCC CTG ATG GOG 1430 
Pro Val Arg Leu Ala Pro Glu Arg Glu Phe He Lys Ser Leu Met Ala 
320 325 330 

ATC GGC AAG CGG CTG GCC AGG CTC CCC AOC AAA GAG CAG AAA ACA CAG 1478 
He Gly Lys Arg Leu Ala Thr Leu Pro Thr Lys Glu Gin Lys Thr Gin 
335 340 345 350 



30 



AGG CTG ATC TCA GAG CTC TCC CTG CTC AAC CAT AAG CTC OCT GCC CGA 
Arg Leu He Ser Glu Leu Ser Leu Leu Asn His Lys Leu Pro Ala Arg 
355 360 365 



1526 



35 



GTC TOG CTG CCC ACT GCT GGC TTT GAC CAC CAC GTG CTC OCT GTA CCC 
Val Trp Leu Pro Thr Ala Gly Phe Asp His His Val Val Arg Val Pro 
370 375 380 



1574 



40 



45 



CAC ACA CAG GCT GTT GTC CTC AAC TCC AAG GAC AAG GCT 000 TAG CTG 1622 
His Thr Gin Ala Val Val Leu Asn Ser Lys Asp Lys Ala Pro Tyr Leu 
385 390 395 

ATT TAT GTG GAA GTC CTT GAA TCT GAA AAC TTT GAC AOC AOC AGT GTC 1670 
He Tyr Val Glu Val Leu Glu Cys Glu Asn Phe Asp Thr Thr Ser Val 
400 405 410 



OCT GCC CGG ATC CCC GAG AAC CGA ATT CGG AGT AGG AGG TCC CTA GAA 
Pro Ala Arg He Pro Glu Asn Arg He Arg Ser Thr Arg Ser Val Glu 
415 420 425 430 



1718 



50 



AAC TTG CCC GAA TCT GCT ATT AOC CAT GAG CAG CGA GCT GGC AGC TTC 
Asn Leu Pro Glu Cys Gly He Thr His Glu Gin Arg Ala Gly Ser Phe 



1766 
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10 



15 



435 440 445 

AGC ACT GTG CCC AAC TAT GAC AAC GAT GAT GAG GCC TGG TOG GTG GAT 1814 
Ser Thr Val Pro Asn Tyr Asp Asn Asp Asp Glu Ala Trp Ser Val Asp 
450 455 460 

GAC ATA GGC GAG CTG CAA GTG GAG CTC CCC GAA GTG CAT ACC AAC AGC 1862 
Asp lie Gly Glu Leu Gin Val Glu Leu Pro Glu Val His Thr Asn Ser 
465 470 475 

TGT GAC AAC ATC TCC CAG TTC TCT GTG GAC AGC ATC ACC AGC CAG GAG 1910 
Cys Asp Asn lie Ser Gin Hie Ser Val Asp Ser lie Thr Ser Gin Glu 
480 485 490 

AGC AAG GAG OCT GIG TTC ATT GCA GCA GGG GAC ATC CGC CGG CGC CTT 1958 
Ser Lys Glu Pro Val Phe lie Ala Ala Gly Asp lie Arg Azg Arg Leu 
495 500 505 510 

20 TOG GAA CAG CTG GCT CAT ACC COG ACA GCC TTC AAA CGA GAC CCA GAA 2006 

Ser Glu Gin Leu Ala His Thr Pro Thr Ala Phe Lys Arg Asp Pro Glu 
515 520 525 

GAT OCT TCT GCA GIT GCT CTC AAA GAG CCC TGG CAG GAG AAA GTA CGG 2054 
25 Asp Pro Ser Ala Val Ala Leu Lys Glu Pro Trp Gin Glu Lys Val Arg 
530 535 540 

CGG ATC AGA GAG GGC TCC CCC TAC GGC CAT CTC CCC AAT TGG CGG CTC 2102 
Arg lie Arg Glu Gly Ser Pro Tyr Gly His Leu Pro Asn Trp Arg Leu 
545 550 555 



30 



35 



40 



45 



50 



CTG TCA GTC ATT GTC AAG TGT GGG GAT GAC CTT CGG CAA GAG CTT CTG 2150 
Leu Ser Val lie Val Lys Cys Gly Asp Asp Leu Arg Gin Glu Leu Leu 
560 565 570 

GCC TIT CAG GTG TTG AAG CAA CTG CAG TCC ATT TGG GAA CAG GAG CGA 2198 
Ala Phe Gin Val Leu Lys Gin Leu Gin Ser lie Trp Glu Gin Glu Arg 
575 580 585 590 

GTG CCC CTT TGG ATC AAG CCA ATA CAA GAT TCT TGT GAA ATT ACG ACT 2246 
Val Pro Leu Trp lie Lys Pro lie Gin Asp Ser Cys Glu lie Thr Thr 
595 " 600 605 

GAT AGT GGC ATG ATT GAA CCA GTG GTC AAT GCT GTG TCC ATC CAT CAG 2294 
Asp Ser Gly Met He Glu Pro Val Val Asn Ala Val Ser He His Gin 
610 615 620 

GTG AAG AAA CAG TCA CAG CTC TCC TTG CTC GAT TAC TTC CTA CAG GAG 2342 
Val Lys Lys Gin Ser Gin Leu Ser Leu Leu Asp Tyr Phe Leu Gin Glu 
625 630 635 
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10 



15 



35 



40 



45 



CAC GGC AGT TAG ACC ACT GAG GCA TTC CTC AGT GCA CAG CGC AAT ITT 2390 
His Gly Ser Tyr Thr Thr Glu Ala Phe Leu Ser Ala Gin Arg Asn Phe 
640 645 650 

GTG CAA AGT TGT GCT GGG TAG TGC TTG GTC TGC TAG CTG CTG CAA GTC 2438 
Val Gin Ser Cys Ala Gly Tyr Cys Leu Val Cys Tyr Leu Leu Gin Val 
655 660 665 670 

AAG GAG AGA CAC AAT GGG AAT ATC CTT TTG GAG GCA GAA GGC CAC ATC 2486 
Lys Asp Arg His Asn Gly Asn He Leu Leu Asp Ala Glu Gly His He 
675 680 685 

ATC CAC ATC GAG TTT GGC TTC ATC CTC TCC AGC TCA CCC CGA AAT CTG 2534 
lie His He Asp Phe Gly Phe He Leu Ser Ser Ser Pro Arg Asn Leu 
690 695 700 



GGC TTT GAG ACG TCA GCC TTT AAG CTG ACC AGA GAG TTT GTG GAT CTG 2582 
Gly Phe Glu Thr Ser Ala Phe Lys Leu Thr Thr Glu Phe Val Asp Val 
20 705 710 715 

ATG GGC GGC CTG GAT GGC GAC ATG TTC AAG TAG TAT AAG ATG CTG ATG 2630 
Met Gly Gly Leu Asp Gly Asp Met Phe Asn Tyr Tyr Lys Met Leu Met 
720 725 730 

25 

CTG CAA GGG CTG ATT GGC GCT GGG AAA CAC ATG GAC AAG GTG GTG CAG 2678 
Leu Gin Gly Leu He Ala Ala Arg Lys His Met Asp Lys Val Val Gin 
735 740 745 " 750 

30 ATC GTG GAG ATC ATG CAG CAA GCT TCT CAG CTT OCT TGC TTC CAT GGC 2726 
He Val Glu He Met Gin Gin Gly Ser Gin Leu Pro Cys Hie His Gly 
755 760 765 



50 



TCC AGC ACC ATT CGA AAC CTC AAA GAG AGG TTC CAC ATG AGC ATG ACT 2774 
Ser Ser Thr He Arg Asn Leu Lys Glu Arg Phe His Met Ser Met Thr 
770 775 780 

GAG GAG CAG CTG CAG CTG CTG GTG GAG CAG ATG GTG GAT GGC AGT ATG 2822 
Glu Glu Gin Leu Gin Leu Leu Val Glu Gin Met Val Asp Gly Ser Met 
785 790 795 

GGG TCT ATC ACC AGC AAA CTC TAT GAC GGC TTC CAG TAG CTC ACC AAC 2870 
Arg Ser He Thr Thr Lys Leu Tyr Asp Gly Phe Gin Tyr Leu Thr Asn 
800 805 810 

GGC ATC ATG TGA CACGCTCCTC AGOCCAGGAG TGGTGGGGGG TCCAGGGCAC 2922 

Gly He Met * 

815 

CCTCOCTAGA GGGCCCTTGT CTGAGAAACC CCAAACCAGG AAACCOCAOC TACCCAACCA 2982 
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ATGTGGTAAC TGCGAGAGCT 3042 

TTGGGGCTTC OCTGCCCCTC 3102 

TCACTGCCCT CCAGAAAACA 3162 

TTGTAGGGGT CTCTCAGAGG 3222 

AGGAAGTGGG GAAGAGTAGG 3282 

CATGCTGCTG COCAGCTCTA 3342 

GCCCAAGCTC CCCTTGCTGG 3402 

CATGGGCAAG GGAAGGGAAT 3462 

ATGTGGAATT OCCTACCCTG 3522 

TATTTTTAAT TTTTGTTTGA 3582 

3602 

(A) LENGTH: 829 amino acids 

(B) TYPE: amino acid 
30 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31: 

35 

Met Arg Phe Leu Glu Ala Arg Ser Leu Ala Val Ala Met Gly Asp Thr 
15 10 15 

40 Val Val Glu Pro Ala Pro Leu Lys Pro Thr Ser Glu Pro Thr Ser Gly 

20 25 30 

Pro Pro Gly Asn Asn Gly Gly Ser Leu Leu Ser Val He Thr Glu Gly 
35 40 45 

4j Val Gly Glu Leu Ser Val He Asp Pro Glu Val Ala Gin Lys Ala Cys 
50 55 60 

Gin Glu Val Leu Glu Lys Val Lys Leu Leu His Gly Gly Val Ala Val 
65 70 75 80 





TOCAOOCAAG 


GGAAATGGAA 


GGCAAGAAAC 


ACGAAGGATC 


5 


TGCTGAGGGG 


TGGGAGAGCC 


AGCTGTGGGG 


TOCAGACTTG 




CTGGTCTGTG 


TCAGTATTAC 


CACCAGACTG 


ACTCCAGGAC 




GAGGTGACAA 


ATGTGAGGGA 


C^CTGGGGOC 


TTTCTTCTOC 


10 


TTCTTT0CAC 


AGGCCATCCT 


CTTATT00GT 


TCTG0GG00C 




TTCTCGGTAC 


TTAGGACTTG 
GGAlOCTACCC 


ATOCTGTGGT 


TGCCACTGGC 


15 


aXXTOCCAG 


CTOCCAGGGA 


CCX3AD0CCTG 














TCXX^CAGOC 


CTCCAGTGTA 


CTGAGGCTAC 


TGGOCTAGOC 


20 


ACTCCTTCCC 
AATAAAGTCC 


CAAAOOCAGG 
TTAGTTAGOC 


GAAAAGAGCT 


CTCAArrrrr 



25 (2) INFORMATION FOR SEQ ID NO: 31: 

(i) SEQUENCE CHARACTERISTICS: 
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Ser Ser Arg Gly Thr Pro Leu Glu Leu Val Asn Gly Asp Gly Val Asp 
85 90 95 

Ser Glu lie Arg Cys Leu Asp Asp Pro Pro Ala Gin lie Arg Glu Glu 
100 105 110 

Glu Asp Glu Met Gly Ala Ala Val Ala Ser Gly Thr Ala Lys Gly Ala 
115 120 125 

Arg Arg Arg Arg Gin Asn Asn Ser Ala Lys Gin Ser Trp Leu Leu Arg 
130 ~ 135 * 140 

Leu Phe Glu Ser Lys Leu Phe Asp lie Ser Met Ala lie Ser Tyr Leu 
145 150 155 160 

Tyr Asn Ser Lys Glu Pro Gly Val Gin Ala Tyr lie Gly Asn Arg Leu 
165 170 175 

Phe Cys Phe Arg Asn Glu Asp Val Asp Phe Tyr Leu Pro Gin Leu Leu 
180 185 190 

Asn Met Tyr lie His Met Asp Glu Asp Val Gly Asp Ala lie Lys Pro 
195 200 205 

Tyr lie Val His Arg Cys Arg Gin Ser lie Asn Phe Ser Leu Gin Cys 
210 215 220 

Ala Leu Leu Leu Gly Ala Tyr Ser Ser Asp Met His lie Ser Thr Gin 
225 230 235 240 

Arg His Ser Arg Gly Thr Lys Leu Arg Lys Leu lie Leu Ser Asp Glu 
245 250 255 

Leu Lys Pro Ala His Arg Lys Arg Glu Leu Pro Ser Leu Ser Pro Ala 
260 ~ 265 270 

Pro Asp Thr Gly Leu Ser Pro Ser Lys Arg Thr His Gin Arg Ser Lys 
275 280 285 

Ser Asp Ala Thr Ala Ser lie Ser Leu Ser Ser Asn Leu Lys Arg Thr 
290 295 300 

Ala Ser Asn Pro Lys Val Glu Asn Glu Asp Glu Glu Leu Ser Ser Ser 
305 310 315 320 

Thr Glu Ser He Asp Asn Ser Phe Ser Ser Pro Val Arg Leu Ala Pro 
325 330 335 



Glu Arg Glu Phe He Lys Ser Leu Met Ala He Gly Lys Arg Leu Ala 
340 345 350 
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Thr Leu Pro Thr Lys Glu Gin Lys Thr Gin Arg Leu lie Ser Glu Leu 
355 360 365 

Ser Leu Leu Asn His Lys Leu Pro Ala Arg Val Trp Leu Pro Thr Ala 
370 375 380 

Gly Phe Asp His His Val Val Arg Val Pro His Thr Gin Ala Val Val 
385 390 395 400 

Leu Asn Ser Lys Asp Lys Ala Pro Tyr Leu lie Tyr Val Glu Val Leu 
405 410 415 

Glu Cys Glu Asn Phe Asp Thr Thr Ser Val Pro Ala Arg lie Pro Glu 
420 425 430 

Asn Arg lie Arg Ser Thr Arg Ser Val Glu Asn Leu Pro Glu Cys Gly 
435 440 445 

lie Thr His Glu Gin Arg Ala Gly Ser Phe Ser Thr Val Pro Asn Tyr 
450 455 460 

Asp Asn Asp Asp Glu Ala Trp Ser Val Asp Asp lie Gly Glu Leu Gin 
465 470 475 480 

Val Glu Leu Pro Glu Val His Thr Asn Ser Cys Asp Asn lie Ser Gin 
485 490 495 

Phe Ser Val Asp Ser He Thr Ser Gin Glu Ser Lys Glu Pro Val Phe 
500 505 510 

He Ala Ala Gly Asp He Arg Arg Arg Leu Ser Glu Gin Leu Ala His 
515 520 " 525 

Thr Pro Thr Ala Phe Lys Arg Asp Pro Glu Asp Pro Ser Ala Val Ala 
530 535 540 

Leu Lys Glu Pro Trp Gin Glu Lys Val Arg Arg He Arg Glu Gly Ser 
545 550 ' 555 560 

Pro Tyr Gly His Leu Pro Asn Trp Arg Leu Leu Ser Val He Val Lys 
565 570 575 

Cys Gly Asp Asp Leu Arg Gin Glu Leu Leu Ala Phe Gin Val Leu Lys 
580 " 585 590 

Gin Leu Gin Ser He Trp Glu Gin Glu Arg Val Pro Leu Trp He Lys 
595 600 605 



Pro He Gin Asp Ser Cys Glu He Thr Thr Asp Ser Gly Met He Glu 
610 615 620 
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Pro Val Val Asn Ala Val Ser He His Gin Val Lys Lys Gin Ser Gin 
625 630 635 640 

Leu Ser Leu Leu Asp Tyr Phe Leu Gin Glu His Gly Ser Tyr Thr Thr 
645 650 655 

Glu Ala Phe Leu Ser Ala Gin Arg Asn Phe Val Gin Ser Cys Ala Gly 
660 665 670 

Tyr Cys Leu Val Cys Tyr Leu Leu Gin Val Lys Asp Arg His Asn Gly 
675 680 685 

Asn He Leu Leu Asp Ala Glu Gly His He He His He Asp Phe Gly 
690 695 700 

Phe He Leu Ser Ser Ser Pro Arg Asn Leu Gly Phe Glu Thr Ser Ala 
705 710 715 720 

Phe Lys Leu Thr Thr Glu Phe Val Asp Val Met Gly Gly Leu Asp Gly 
725 730 735 

Asp Met Phe Asn Tyr Tyr Lys Met Leu Met Leu Gin Gly Leu He Ala 
740 745 750 

Ala Arg Lys His Met Asp Lys Val Val Gin He Val Glu He Met Gin 
755 760 765 

Gin Gly Ser Gin Leu Pro Cys Phe His Gly Ser Ser Thr He Arg Asn 
770 775 780 

Leu Lys Glu Arg Phe His Met Ser Met Thr Glu Glu Gin Leu Gin Leu 
785 790 795 800 

Leu Val Glu Gin Met Val Asp Gly Ser Met Arg Ser He Thr Thr Lys 
805 810 815 



Leu Tyr Asp Gly Phe Gin Tyr Leu Thr Asn Gly He Met 
820 825 



(2) INFORMATION FOR SEQ ID NO: 32: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2487 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA( genomic) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32: 



5 



10 



20 



25 



30 



35 



45 



50 



ATGAGATTCT 


TGGAAGCTOG 


AAGTCTGGCT 


GTGGCCATGG 


GAGATACAGT 


AGTGGAGCCT 


60 


GCCGCCTTGA 


AGCCAACTTC 


TGAGCCCACT 


TCTGGCCCAC 


CAGGGAATAA 


TGGGGGCTCC 


120 


CTGCTAAGTG 


TCATCACGGA 


GGGGGTCGGG 


GAACTATCAG 


TGATTGACOC 


TGAGGTGGCC 


180 


CAGAAGGOCT 


GCCAGGAGGT 


GTTGGAGAAA 


GTCAAGCTTT 


TGCATGGAGG 


CGTGGCAGTC 


240 


TCTAGCAGAG 


GCAOCCCACT 


GGAGTTGGTC 


AATGGGGATG 


GTGTGGACAG 


TGAGATCCGT 


300 


TGOCTAGATG 


ATCCACCTGC 


CCAGATCAGG 


GAGGAGGAAG 


ATGAGATGGG 


GGCCGCTGTG 


360 


GCCTCAGGCA 


CAGCCAAAGG 


AGCAAGAAGA 


CGGCGGCAGA 


ACAACTCAGC 


TAAACAGTCT 


420 


TGGCTGCTGA 


GGCTGTTTGA 


GTCAAAACTG 


TTTGACATCT 


CCATGGOCAT 


TTCATACCTG 


480 


TATAACTCCA 


AGGAGCCTGG 


AGTACAAGCC 


TACATTGGCA 


ACCGGCTCTT 


CIGCTTTCGC 


540 


AAOGAGGACG 


TGGACTTCTA 


TCTGCCCCAG 


TTGCTTAACA 


TGTACATOCA 


CATQGATGAG 


600 


GACGTGGGTG 


ATGCCATTAA 


GCOCTACATA 


GTOCACOGTT 


GCCGCCAGAG 


CATTAACTTT 


660 


TOOCTCCAGT 


GIGCCCTGTT 


gltiuaajjl: 


TATTCTTCAG 


ACATGCACAT 


TTCCACTCAA 


720 


CGACACTOOC 


GTGGGAOCAA 


GCTAOGGAAG 


CTGATCCTCT 


CAGATGAGCT 


AAAGOCAGCT 


780 


CACAOGAAGA 


GGGAGCTGCC 


CTCCTTGAGC 


COGGOCOCTG 


ATACAGGGCT 


GTCTOCCTOC 


840 


AAAAGGACTC 


ACCAGOGCTC 


TAAGTCAGAT 


GCCACTGCCA 


GCATAAGTCT 


CAGCAGCAAC 


900 


CTGAAACGAA 


CAGCCAGCAA 


CCCTAAAGTG 


GAGAATGAGG 


ATGAGGAGCT 


CTCCTOCAGC 


960 


ACCGAGAGTA 


TTGATAATTC 


ATTCAGTTCC 


CCTCTTCGAC 


TGGCTCCTGA 


GAGAGAATTC 


1020 


ATCAAGTCCC 


TGATGGCGAT 


CGGCAAGCGG 


CimXACGC 


TCCCCACCAA 


AGAGCAGAAA 


1080 


ACACAGAGGC 


TGATCTCAGA 


GCTCTCCCTG 


CTCAACCATA 


AGCTCCCTGC 


CCGAGTCTGG 


1140 


CTGOCCACTG 


CTGGCTTTGA 


CCACCACGTG 


GTCCGTCTAC 


CCCACACACA 




1200 


CTCAACTOCA 


AGGACAAGGC 


TOCCTACCTG 


ATTTATGTGG 


AAGTCCTTGA 


ATGTGAAAAC 


1260 


TTTGACACCA 


CCACTGTOCC 


TGCCCGGATC 


CCCGAGAACC 


GAATTOGGAG 


TAGGAGGTCC 


1320 


GTAGAAAACT 


TGCCCGAATG 


TGGTATTACC 


CATGAGCAGC 


GAGCTGGCAG 


CTTCAGCACT 


1380 


GTGOCCAACT 


ATGACAACGA 


TGATGAGGCC 


TGGTOGGTGG 


ATGACATAGG 


CGAGCTGCAA 


1440 



55 



87 



EP0 796 913 A2 



GTGGAGCTCC CCGAAGTGCA TACCAACAGC TGTGACAACA TCTCCCAGTT CTCTGTGGAC 1500 

AGCATCACCA GCCAGGAGAG CAAGGAGOCT GTGTTCATTG CAGCAGGGGA CMOXXXX3G 1560 

5 ' 

OGOCTTTOGG AACAGCTGGC TCATACXCOG ACAGOCTTCA AACGAGACOC AGAAGATOCT 1620 

TCTGCAGTTG CTCTCAAAGA GOCCTGGCAG GAGAAAGTAC GGCGGATCAG AGAGGGCTCC 1680 

70 CCCTA03GCC ATCTCCCCAA TTGGCGGCTC CTGTCAGTCA TTGTCAAGTG TGGGGATGAC 1740 

CTTOGGCAAG AGCTTCTGGC CTTTCAGGTG TTGAAGCAAC TGCAGTCCAT TTOGGAACAG 1800 

GAGCGAGTGC OOCTTTGGAT CAAGCCAATA CAAGATTCTT GTGAAATTAC GACTGATACT 1860 

75 

GGCATGATTC AACCAGTGGT CAATGCTGTG TCCATCCATC AGGTGAAGAA ACAGTCACAG 1920 

CTCTOCTTGC TOGATTACTT CCTACAGGAG CACQGCAGTT ACACX^CTGA GGCATTOCTC 1980 

20 AGTGCACAGC GCAATTTTGT GCAAAGTTGT GCTGGGTACT GCTTGGTCTG CTADCTGCTG 2040 

CAAGTCAAGG ACAGACAGAA TGGGAATATC CTTTTGGACG CAGAAGGOCA CATCATOCAC 2100 

ATCGACTTTG GCTTCATOCT CTCCAGCTCA OOOCGAAATC TGGGCTTTGA GACGTCAGOC 2160 

TTTAAGCTGA OCACAGAGTT TGTGGATGTG ATGGGCGGCC TGGATGGOGA CATCTTCAAC 2220 

TACTATAAGA TGCTGATGCT GCAAGGGCTG ATTGOOGCTC GGAAACACAT GGACAAGCTG 2280 

30 GTGCAGATOG TGGAGATCAT GCAGCAAGGT TCTCAGCTTC CTTGCTTOCA TGGCTOCAGC 2340 

ADCATTCGAA ACCTCAAAGA GAGGTTGCAC ATGAGCATGA CTGAGGAGCA GCTGCAGCTG 2400 

CTGGTGGAGC AGATGGTGGA TGGCAGTATG OGGTCTATCA CCACCAAACT CTATGACGGC 2460 

TTCCAGTACC TCACCAAOGG CATCATG 2487 



35 



40 



45 



(2) INFORMATION FOR SEQ ID NO: 33: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3324 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA( genomic) 
(iii) HYPOTHETICAL: NO 
50 (iv) ANTI- SENSE: NO 
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(vii) IMMEDIATE SOURCE: 

(A) LIBRARY: Human fetal brain cDNA library 

(B) CLONE: GEN-428B12cl 

(ix) FEATURE: 

(A) NAME /KEY: CDS 

(B) LOCATION: 115.. 2601 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 33: 



15 



20 



CCGGAATTCC GGGAAGGCCG GAGCAAGTTT TGAAGAAGTC CCTATCAGAT TACACTTGGT 60 

TGACTACTCC GGAGCAGCCA CTAAGAGGGA TGAACAOOCC TGOGTGGAAA TTGA ATG 117 

Met 
1 

AGA TTC TTG GAA OCT CGA ACT CTG OCT GTG GCC ATG GGA GAT ACA GTA 165 
Arg Phe Leu Glu Ala Arg Ser Leu Ala Val Ala Met Gly Asp Thr Val 
5 10 15 



CTG GAG OCT GCC CCC TTG AAG OCA ACT TCT GAG CCC ACT TCT GGC CCA 213 
Val Glu Pro Ala Pro Leu Lys Pro Thr Ser Glu Pro Thr Ser Gly Pro 
25 20 25 30 

CCA GGG AAT AAT GGG GGG T0C CTG CTA ACT GTC ATC ACG GAG GGG GTC 261 
Pro Gly Asn Asn Gly Gly Ser Leu Leu Ser Val lie Thr Glu Gly Val 
35 40 45 

30 

GGG GAA CTA TCA GTG ATT GAC OCT GAG GTG GCC CAG AAG GCC TGC CAG 309 
Gly Glu Leu Ser Val He Asp Pro Glu Val Ala Gin Lys Ala Cys Gin 
50 55 60 65 

35 GAG GTG TTG GAG AAA GTC AAG CTT TTG CAT GGA GGC GTG GCA GTC TCT 357 
Glu Val Leu Glu Lys Val Lys Leu Leu His Gly Gly Val Ala Val Ser 
70 75 80 

AGC AGA GGC ACC CCA CTG GAG TTG GTC AAT GGG GAT GGT GTG GAC ACT 405 
40 Ser Arg Gly Thr Pro Leu Glu Leu Val Asn Gly Asp Gly Val Asp Ser 

85 90 " 95 

GAG ATC OCT TGC CTA GAT GAT CCA OCT GCC CAG ATC AGG GAG GAG GAA 453 
Glu He Arg Cys Leu Asp Asp Pro Pro Ala Gin He Arg Glu Glu Glu 
45 100 105 110 

GAT GAG ATG GGG GCC GCT GTG GCC TCA GGC ACA GCC AAA GGA GCA AGA 501 
Asp Glu Met Gly Ala Ala Val Ala Ser Gly Thr Ala Lys Gly Ala Arg 
115 120 125 



50 



AGA CGG CGG CAG AAC AAC TCA GCT AAA CAG TCT TGG CTG CTG AGG CTG 549 



55 
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Arg Arg Arg Gin Asn Asn Ser Ala Lys Gin Ser Trp Leu Leu Arg Leu 
130 135 140 145 



TTT GAG TCA AAA CTG TTT GAG ATC TCC ATG GCC ATT TCA TAG CTG TAT 
Phe Glu Ser Lys Leu Phe Asp He Ser Met Ala He Ser Tyr Leu Tyr 
150 155 160 



597 



10 



AAC TOC AAG GAG OCT GGA GTA CAA GCC TAG ATT GGC AAC CGG CTC TTC 
Asn Ser Lys Glu Pro Gly Val Gin Ala Tyr He Gly Asn Arg Leu Phe 
165 170 175 



645 



75 



TGC TTT CGC AAC GAG GAC GTG GAC TTC TAT CTG GCC CAG TTG CTT AAC 
Cys Phe Arg Asn Glu Asp Val Asp Phe Tyr Leu Pro Gin Leu Leu Asn 
180 185 190 



693 



20 



25 



30 



ATG TAG ATC CAC ATG GAT GAG GAC GTG GGT GAT GCC ATT AAG CCC TAG 741 
Met Tyr He His Met Asp Glu Asp Val Gly Asp Ala He Lys Pro Tyr 
195 200 205 

ATA CTC CAC CGT TGC CGC CAG AGC ATT AAC TTT TCC CTC CAG TGT GCC 789 
He Val His Arg Cys Arg Gin Ser He Asn Phe Ser Leu Gin Cys Ala 
210 215 220 225 

CTG TTC CTT GGG GCC TAT TCT TCA GAC ATG CAC ATT TOC ACT CAA CGA 837 
Leu Leu Leu Gly Ala Tyr Ser Ser Asp Met His He Ser Thr Gin Arg 
230 235 240 

CAC TOC CGT GGG ACC AAG CTA CGG AAG CTG ATC CTC TCA GAT GAG CTA 885 
His Ser Arg Gly Thr Lys Leu Arg Lys Leu He Leu Ser Asp Glu Leu 
245 250 255 



35 



40 



AAG CCA GCT CAC AGG AAG AGG GAG CTG CCC TCC TTG AGC COG GCC OCT 933 
Lys Pro Ala His Arg Lys Arg Glu Leu Pro Ser Leu Ser Pro Ala Pro 
260 265 270 

GAT ACA GGG CTG TCT CCC TCC AAA AGG ACT CAC CAG CGC TCT AAG TCA 981 
Asp Thr Gly Leu Ser Pro Ser Lys Arg Thr His Gin Arg Ser Lys Ser 
275 280 285 

GAT GCC ACT GCC AGC ATA ACT CTC AGC AGC AAC CTG AAA CGA ACA GOC 1029 
Asp Ala Thr Ala Ser He Ser Leu Ser Ser Asn Leu Lys Arg Thr Ala 
290 295 300 305 



45 



50 



AGC AAC OCT AAA GTG GAG AAT GAG GAT GAG GAG CTC TCC TCC AGC ACC 1077 
Ser Asn Pro Lys Val Glu Asn Glu Asp Glu Glu Leu Ser Ser Ser Thr 
310 315 320 

GAG ACT ATT GAT AAT TCA TTC ACT TCC OCT GTT CGA CTG GCT OCT GAG 1125 
Glu Ser He Asp Asn Ser Phe Ser Ser Pro Val Arg Leu Ala Pro Glu 
325 330 335 
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AGA GAA TTC ATC AAG TCC CTG ATG GOG ATC GGC AAG CGG CTG GOC ACG 
Arg Glu Phe lie Lys Ser Leu Met Ala He Gly Lys Arg Leu Ala Thr 
340 345 350 



1173 



CTC OCC ACC AAA GAG CAG AAA ACA CAG AGG CTG ATC TCA GAG CTC TOC 
Leu Pro Thr Lys Glu Gin Lys Thr Gin Arg Leu He Ser Glu Leu Ser 
355 360 365 



1221 



10 CTG CTC AAC CAT AAG CTC OCT GCC OGA CTC TGG CTG COC ACT GCT GGC 
Leu Leu Asn His Lys Leu Pro Ala Arg Val Trp Leu Pro Thr Ala Gly 
370 375 380 385 



1269 



TTT GAC CAC CAC GTG GTC CGT GTA OCC CAC ACA CAG GCT GTT GTC CTC 
is Phe Asp His His Val Val Arg Val Pro His Thr Gin Ala Val Val Leu 

390 395 400 



1317 



20 



25 



30 



AAC TCC AAG GAC AAG GCT COC TAG CTG ATT TAT GTG GAA GTC CTT GAA 
Asn Ser Lys Asp Lys Ala Pro Tyr Leu He Tyr Val Glu Val Leu Glu 
405 410 415 



ACC CAT GAG CAG OGA GCT GGC AGO TTC AGO ACT GTG COC AAC TAT GAC 
Thr His Glu Gin Arg Ala Gly Ser Phe Ser Thr Val Pro Asn Tyr Asp 
450 455 ' 460 465 



1365 



TGT GAA AAC TTT GAC ACC ACC ACT CTC OCT GCC CGG ATC OCC GAG AAC 1413 
Cys Glu Asn Phe Asp Thr Thr Ser Val Pro Ala Arg He Pro Glu Asn 
420 425 430 

OGA ATT CGG ACT ACG AGG TOC GTA GAA AAC TTG COC GAA TGT GCT ATT 1461 
Arg He Arg Ser Thr Arg Ser Val Glu Asn Leu Pro Glu Cys Gly lie 
435 440 445 



1509 



35 



AAC GAT GAT GAG GCC TGG TOG GTG GAT GAC ATA GGC GAG CTG CAA GTG 
Asn Asp Asp Glu Ala Trp Ser Val Asp Asp He Gly Glu Leu Gin Val 
470 475 480 



1557 



40 



GAG CTC OCC GAA GTG CAT ACC AAC AGO TGT GAC AAC ATC TOC CAG TTC 1605 
Glu Leu Pro Glu Val His Thr Asn Ser Cys Asp Asn He Ser Gin Hie 
485 490 495 

TCT GTG GAC AGC ATC ACC AGO CAG GAG AGO AAG GAG CCT GTG TTC ATT 1653 
Ser Val Asp Ser He Thr Ser Gin Glu Ser Lys Glu Pro Val Phe He 
500 505 510 



45 



GCA GCA GGG GAC ATC OGC OGG OGC CTT TOG GAA CAG CTG GCT CAT ACC 
Ala Ala Gly Asp He Arg Arg Arg Leu Ser Glu Gin Leu Ala His Thr 
515 520 525 



1701 



50 



COG ACA GCC TTC AAA CGA GAC OCA GAA GAT OCT TCT GCA GIT GCT CTC 
Pro Thr Ala Phe Lys Arg Asp Pro Glu Asp Pro Ser Ala Val Ala Leu 



1749 



55 
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10 



15 



530 535 540 545 

AAA GAG CCC TGG CAG GAG AAA GTA CX3G CGG ATC'AGA GAG GGC TCC OCC 1797 
Lys Glu Pro Trp Gin Glu Lys Val Arg Arg He Arg Glu Gly Ser Pro 
550 555 ~ 560 

TAG GGC CAT CTC CCC AAT TGG CGG CTC CTG TCA GTC ATT GTC AAG TGT 1845 
Tyr Gly His Leu Pro Asn Trp Arg Leu Leu Ser Val He Val Lys Cys 
565 570 575 

GGG GAT GAC CTT CGG CAA GAG CTT CTG GCC TTT CAG GIG TTG AAG CAA 1893 
Gly Asp Asp Leu Arg Gin Glu Leu Leu Ala Phe Gin Val Leu Lys Gin 
580 585 590 

CTG CAG TCC ATT TGG GAA CAG GAG CGA GTG CCC CTT TGG ATC AAG CCA 1941 
Leu Gin Ser He Trp Glu Gin Glu Arg Val Pro Leu Trp He Lys Pro 
595 600 605 

20 ATA CAA GAT TCT TGT GAA ATT ACG ACT GAT ACT GGC ATG ATT GAA CCA 1989 
He Gin Asp Ser Cys Glu He Thr Thr Asp Ser Gly Met He Glu Pro 
610 615 620 625 

GTG GTC AAT GCT CTG TCC ATC CAT CAG GTG AAG AAA CAG TCA CAG CTC 2037 
25 val Val Asn Ala Val Ser He His Gin Val Lys Lys Gin Ser Gin Leu 

630 635 640 

TCC TTG CTC GAT TAC TTC CTA CAG GAG CAC GGC ACT TAG ACC ACT GAG 2085 
Ser Leu Leu Asp Tyr Phe Leu Gin Glu His Gly Ser Tyr Thr Thr Glu 
so 645 650 655 

GCA TTC CTC ACT GCA CAG CGC AAT TTT GTG CAA ACT TCT GCT GGG TAC 2133 
Ala Phe Leu Ser Ala Gin Arg Asn Phe Val Gin Ser Cys Ala Gly Tyr 
660 665 670 

35 

TGC TTG GTC TGC TAC CTG CTG CAA CTC AAG GAC AGA CAC AAT GGG AAT 2181 
Cys Leu Val Cys Tyr Leu Leu Gin Val Lys Asp Arg His Asn Gly Asn 
675 680 685 

40 ATC CTT TTG GAC GCA GAA GGC CAC ATC ATC CAC ATC GAC TTT GGC TTC 2229 
He Leu Leu Asp Ala Glu Gly His He He His He Asp Phe Gly Phe 
690 695 700 705 



45 



50 



55 



ATC CTC TCC AGC TCA CCC CGA AAT CTG GGC TTT GAG ACG TCA GCC TTT 2277 
He Leu Ser Ser Ser Pro Arg Asn Leu Gly Phe Glu Thr Ser Ala Phe 
710 715 720 

AAG CTG ACC ACA GAG TTT GTG GAT GTG ATG GGC GGC CTG GAT GGC GAC 2325 
Lys Leu Thr Thr Glu Phe Val Asp Val Met Gly Gly Leu Asp Gly Asp 
725 730 735 
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/ 



10 



15 



55 



ATG TTC AAC TAG TAT AAG ATG CTG ATG CTG CAA GGG CTG ATT GOC GCT 2373 
Met Phe Asn Tyr Tyr Lys Met Leu Met Leu Gin Gly Leu lie Ala Ala 
740 745 750 

OGG AAA CAC ATG GAC AAG GTG GTG CAG ATC CTG GAG ATC ATG CAG CAA 2421 
Arg Lys His Met Asp Lys Val Val Gin He Val Glu He Met Gin Gin 
755 760 765 

GGT TCT CAG CTT OCT TGC TTC CAT GGC TCC AGC ACC ATT CGA AAC CTC 2469 
Gly Ser Gin Leu Pro Cys Phe His Gly Ser Ser Thr He Arg Asn Leu 
770 775 780 785 

AAA GAG AGG TTC CAC ATG AGC ATG ACT GAG GAG CAG CTG CAG CTG CTG 2517 
Lys Glu Arg Phe His Met Ser Met Thr Glu Glu Gin Leu Gin Leu Leu 
790 795 800 



CTG GAG CAG ATG GTG GAT GGC ACT ATG CGG TCT ATC ACC ACC AAA CTC 2565 
Val Glu Gin Met Val Asp Gly Ser Met Arg Ser He Thr Thr Lys Leu 
20 805 810 815 

TAT GAC GGC TTC CAG TAG CTC ACC AAC GGC ATC ATG TGA CAQXTCCTC 2614 
Tyr Asp Gly Phe Gin Tyr Leu Thr Asn Gly He Met * 
820 825 830 

25 

AGCOCAGGAG TGGTGGGGGG TCCAGGGCAC CCTCOCTAGA GGGCCCITCT CTGAGAAACC 2674 

CCAAAOCAGG AAAOCCCACC TAO0CAAOCA T0CAC0CAAG GGAAATGGAA GGCAAGAAAC 2734 

30 ACGAAGGATC ATGTGGTAAC TGCGAGAGCT TGCTGAGGGG TGGGAGAGOC AGCTCTGGGG 2794 

TCCAGACTTG TTGGGGCTTC CCTOXCCTC CTGG?TCTGTG TCAGTATTAC CACCAGACTG 2854 

ACTCCAGGAC TCACTGCCCT CCAGAAAACA GAGGTGACAA ATGTGAGGGA CACTGGGGCC 2914 

35 TTTCTTCTCC TTGTAGGGGT CTCTCAGAGG TTCTTTOCAC AGGCCATCCT CTTATTCCGT 2974 

TCTGGGGCCC AGGAAGTGGG GAAGAGTAGG TTCTCGGTAC TTAGGACTTG ATCCTGTGCT 3034 

TGCCACTGGC CATGCTGCTG CCCAGCTCTA CC0CTC0CAG GGACCTACCC CTCCCAGGGA 3094 

40 

CTGACCCCTG GCCCAAGCTC CCCTTGCTGG CGGGCGCTGC CTGGGOCCTG CACTTGCTGA 3154 

GCTT0C0CAT CATGGGCAAG GCAAGGGAAT TOXACAGCC CTOCAGTGTA CTGAGGGTAG 3214 

45 TGGCCTAGCC ATGTGGAATT CCCTACCCTG ACTCCTTOCC CAAACCCAGG GAAAAGAGCT 3274 

CTCAATTTTT TATTTTTAAT TTTTGTTTGA AATAAAGTCC TTAGTTAGOC 3324 

so (2) INFORMATION FOR SEQ ID NO: 34: 



93 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 810 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 34: 



Met Pro Met Asp Leu He Leu Val Val Trp Phe Cys Val Cys Thr Ala 
15 10 15 

Arg Thr Val Val Gly Phe Gly Met Asp Pro Asp Leu Gin Met Asp He 
20 25 30 

Val Thr Glu Leu Asp Leu Val Asn Thr Thr Leu Gly Val Ala Gin Val 
35 40 45 

Ser Gly Met His Asn Ala Ser Lys Ala Phe Leu Phe Gin Asp He Glu 
50 55 60 

Arg Glu He His Ala Ala Pro His Val Ser Glu Lys Leu He Gin Leu 
65 70 75 80 

Phe Gin Asn Lys Ser Glu Phe Thr He Leu Ala Thr Val Gin Gin Lys 
85 90 95 

Pro Ser Thr Ser Gly Val He Leu Ser He Arg Glu Leu Glu His Ser 
100 105 110 

Tyr Hie Glu Leu Glu Ser Ser Gly Leu Arg Asp Glu He Arg Tyr His 
115 120 125 

Tyr He His Asn Gly Lys Pro Arg Thr Glu Ala Leu Pro Tyr Arg Met 
130 135 140 

Ala Asp Gly Gin Trp His Lys Val Ala Leu Ser Val Ser Ala Ser His 
145 150 155 160 

Leu Leu Leu His Val Asp Cys Asn Arg He Tyr Glu Arg Val He Asp 
165 170 175 

Pro Pro Asp Thr Asn Leu Pro Pro Gly He Asn Leu Trp Leu Gly Gin 
180 185 190 

Arg Asn Gin Lys His Gly Leu Phe Lys Gly He He Gin Asp Gly Lys 
195 200 205 



He He Phe Met Pro Asn Gly Tyr He Thr Gin Cys Pro Asn Leu Asn 
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210 



215 



220 



His Thr Cys Pro Thr Cys Ser Asp Phe Leu Ser Leu Val Gin Gly He 
225 230 235 240 

Met Asp Leu Gin Glu Leu Leu Ala Lys Met Thr Ala Lys Leu Asn Tyr 
245 250 ~ 255 

Ala Glu Thr Arg Leu Ser Gin Leu Glu Asn Cys His Cys Glu Lys Thr 
260 265 270 

Cys Gin Val Ser Gly Leu Leu Tyr Arg Asp Gin Asp Ser Trp Val Asp 
275 280 ~ 285 

Gly Asp His Cys Arg Asn Cys Thr Cys Lys Ser Gly Ala Val Glu Cys 
290 295 300 

Arg Arg Met Ser Cys Pro Pro Leu Asn Cys Ser Pro Asp Ser Leu Pro 
305 310 315 320 

Val His He Ala Gly Gin Cys Cys Lys Val Cys Arg Pro Lys Cys He 
325 330 335 

Tyr Gly Gly Lys Val Leu Ala Glu Gly Gin Arg He Leu Thr Lys Ser 
340 345 350 

Cys Arg Glu Cys Arg Gly Gly Val Leu Val Lys He Thr Glu Met Cys 
355 360 365 

Pro Pro Leu Asn Cys Ser Glu Lys Asp His He Leu Pro Glu Asn Gin 
370 375 380 

Cys Cys Arg Val Cys Arg Gly His Asn Phe Cys Ala Glu Gly Pro Lys 
385 390 395 400 

Cys Gly Glu Asn Ser Glu Cys Lys Asn Trp Asn Thr Lys Ala Thr Cys 
405 410 415 

Glu Cys Lys Ser Gly Tyr He Ser Val Gin Gly Asp Ser Ala Tyr Cys 
420 425 430 

Glu Asp He Asp Glu Cys Ala Ala Lys Met His Tyr Cys His Ala Asn 
435 440 445 

Thr Val Cys Val Asn Leu Pro Gly Leu Tyr Arg Cys Asp Cys Val Pro 
450 455 " 460 



Gly Tyr He Arg Val Asp Asp Phe Ser Cys Thr Glu His Asp Glu Cys 
465 470 475 480 
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Gly Ser Gly Gin His Asn Cys Asp Glu Asn Ala lie Cys Thr Asn Thr 
485 490 495 



Val Gin Gly His Ser Cys Thr Cys Lys Pro Gly Tyr Val Gly Asn Gly 
500 505 510 



Thr He Cys Arg Ala Phe Cys Glu Glu Gly Cys Arg Tyr Gly Gly Thr 
515 520 525 



Cys Val Ala Pro Asn Lys Cys Val Cys Pro Ser Gly Phe Thr Gly Ser 
530 535 540 



His Cys Glu Lys Asp He Asp Glu Cys Ser Glu Gly He He Glu Cys 
545 550 555 560 

His Asn His Ser Arg Cys Val Asn Leu Pro Gly Trp Tyr His Cys Glu 
565 570 575 



Cys Arg Ser Gly Phe His Asp Asp Gly Thr Tyr Ser Leu Ser Gly Glu 
580 585 590 

Ser Cys He Asp He Asp Glu Cys Ala Leu Arg Thr His Thr Cys Trp 
595 600 605 



Asn Asp Ser Ala Cys He Asn Leu Ala Gly Gly Phe Asp Cys Leu Cys 
610 615 620 

Pro Ser Gly Pro Ser Cys Ser Gly Asp Cys Pro His Glu Gly Gly Leu 
625 630 635 640 

Lys His Asn Gly Gin Val Trp Thr Leu Lys Glu Asp Arg Cys Ser Val 
645 650 655 



Cys Ser Cys Lys Asp Gly Lys He Phe Cys Arg Arg Thr Ala Cys Asp 
660 665 670 

Cys Gin Asn Pro Ser Ala Asp Leu Phe Cys Cys Pro Glu Cys Asp Thr 
675 680 685 



Arg Val Thr Ser Gin Cys Leu Asp Gin Asn Gly His Lys Leu Tyr Arg 
690 695 700 

Ser Gly Asp Asn Trp Thr His Ser Cys Gin Gin Cys Arg Cys Leu Glu 
705 710 715 720 

Gly Glu Val Asp Cys Trp Pro Leu Thr Cys Pro Asn Leu Ser Cys Glu 



725 



730 



735 



Tyr Thr Ala 



He Leu Glu 
740 



Gly Glu Cys Cys Pro Arg Cys Val Ser Asp 
745 ~ 750 



96 



EP0 796 913 A2 



10 



Pro Cys Leu Ala Asp Asn lie Thr Tyr Asp lie Arg Lys Thr Cys Leu 
755 760 765 

Asp Ser Tyr Gly Val Ser Arg Leu Ser Gly Ser Val Trp Thr Met Ala 
770 775 780 

Gly Ser Pro Cys Thr Thr Cys Lys Cys Lys Asn Gly Arg Val Cys Cys 
785 790 795 800 

Ser Val Asp Phe Glu Cys Leu Gin Asn Asn 
805 810 



*5 (2) INFORMATION FOR SEQ ID NO: 35: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2430 base pairs 

(B) TYPE: nucleic acid 

20 (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA( genomic) 

25 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 35: 

ATGCCGATGG ATTPGATTTT AGTTGTCTGG TTCTGTGTGT GCACTGOCAG GACAGTGGTC 60 

GGCTTTGGGA TGGACCCTGA CCTTCAGATG GATATCGTCA CCGAGCTTGA CCTTCTGAAC 120 

ACCACCCTTG GAGTTGCTCA GGTGTCTGGA ATGCACAATG CCAGCAAAGC ATTTTTATTT 180 

CAAGACATAG AAAGAGAGAT OCATGCAGCT CCTCATCTGA GTGAGAAATT AATTCAGCTG 240 

TTOCAGAACA AGAGTGAATT CACCATTTTG GCCACTGTAC AGCAGAAGCC ATCXZACTTCA 300 

GGACTGATAC TGTCCATTCG AGAACTGGAG CACAGCTATT TTGAACTGGA GAGCAGTGGC 360 

CTGAGGGATG AGATTCGGTA TCACTACATA CACAATGGGA AGCCAAGGAC AGAGGCACTT 420 

CCTTACCGCA TGGCAGATGG ACAATGGCAC AAGGTTGCAC TGTCAGTTAG CGCCTCTCAT 480 

CTCCTGCTCC ATGTCGACTG TAACAGGATT TATGAGCGTG TGATAGACCC TOCAGATACC 540 

45 AAOCTTOCCC CAGGAATCAA TTTATGGCTT GGCCAGCGCA AOCAAAAGCA TGGCTTATTC 600 

AAAGGGATCA TCCAAGATGG GAAGATCATC TTTATGCCGA ATGGATATAT AACACAGTGT 660 

CCAAATCTAA ATCACACTTG CCX^ACCTGC AGTGATTTCT TAAGCCTGGT GCAAGGAATA 720 

50 



30 



35 



40 



55 
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ATGGATTTAC AAGAGCTTTT GGQCAAGATG ACTGCAAAAC TAAATTATGC AGAGACAAGA 780 

CTTAGTCAAT TGGAAAACTG TCATTGTGAG AAGACTTGTC AAGTGACTGG ACTGCTCTAT 840 

5 

CGAGATCAAG ACTCTTGGGT AGATGGTGAC CATTGCAGGA ACTGCACTTG CAAAAGTOGT 900 

GOOGTGGAAT GCOGAAGGAT GTCCTGTCOC CCTCTCAATT GCTCCCCAGA CTCOCTCCCA 960 

10 GTACACATTG CTGGCCAGTG CTGTAAGGTC TGOOGAOCAA AATCTATCTA TGGAGGAAAA 1020 

GTTCTTGCAG AAGGCCAGCG GATTTTAAGC AAGAGCTGTC GGGAATGCCG AGGTGGAGTT 1080 

TTAGTAAAAA TTACAGAAAT GTGTCCTCCT TTGAACTGCT CAGAAAAGGA TCACATTCTT 1140 

75 

OCTGAGAATC AGTOCTGOOG TOTCTCTAGA GGTCATAACT TTTGTGCAGA AGGADCTAAA 1200 

TCTGGTGAAA ACTCAGAGTG CAAAAACTGG AATACAAAAG CTACTTGTGA GTGCAAGAGT 1260 

20 GGITACATCT CTGTOCAGGG AGACTCTGOC TACTGTGAAG ATATTGATGA GTGTGCAGCT 1320 

AAGATOCATT ACTGTCATGC CAATACTGTG TGTGTCAACC TTOCTGGGTT ATATOGCTGT 1380 

GACTGTGTCC CAGGATACAT TCGTGTGGAT GACTTCTCTT GTACAGAACA OGATGAATGT 1440 

25 

GGCAGCGGCC AGCACAACTG TGATGAGAAT GCCATCTOCA CCAACACTGT CCAGGGACAC 1500 

AGCTGCAOCT GCAAAOOGGG CTACGTGGGG AAOGGGACCA TCTGCAGAGC TTTCTGTGAA 1560 

30 GAGGGCTGCA GATACGGTGG AAOGTG7TCTG GCTOOCAACA AATGTGTCTG TCCATCTGGA 1620 

TTCACAGGAA GOCACTGOGA GAAAGATATT GATGAATGTT CAGAGGGAAT CATTGAGTGC 1680 

CACAACCATT CXXX3CTGOGT TAAOCTGCCA GGGTGGTAOC ACTGTGAGTG CAGAAGOGGT 1740 

35 TTCCATGACG ATGGGAOCTA TTCACTGTOC GGGGACTOCT GTATTGACAT TGATGAATGT 1800 

GOCTTAAGAA CTCACAOCTG TTGGAACGAT TCTGCCTGCA TCAACCTGGC AGGGGGTTTT 1860 

GACTGTCTCT GCCOCTCTGG GCCCTCCTGC TCTGGTGACT GTCCTCATGA AGGGGGGCTG 1920 

40 

AAGCACAATG GOCAGGTGTG GADCTTGAAA GAAGACAGCT GTTCTGTCTG CTOCTGCAAG 1980 

GATGGCAAGA TATTCTGCCG ACGGACAGCT TGTGATTGCC AGAATOCAAG TGCTGACCTA 2040 

45 TTCTGTTGCC CAGAATOTGA CACCAGAGTC ACAAGTCAAT GTTTAGAOCA AAATGGTCAC 2100 

AAGCTGTATC GAAGTGGAGA CAATTGGACC CATAGCTGTC AGCAGTGTCG GTGTCTGGAA 2160 

GGAGAGGTAG ATTGCTGGOC ACTCACTTGC OOCAACTTGA GCTGTGAGTA TACAGCTATC 2220 

50 
55 
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w 



15 



TTAGAAGGGG AATGTTGTCC CCGCTGTGTC AGTGACCCCT GOCTAGCTGA TAACATCAGC 2280 

TATGACATCA GAAAAACTTG CCTGGACAGC TATGGTCTTT CAOGGCTTAG TGGCTCAGTG 2340 

TGGACGATGG CTGGATCTCC CTGCACAAOC TGTAAATGCA AGAATGGAAG AGTCTGTTGT 2400 

TCTGTGGATT TTGAGTGTCT TCAAAATAAT 2430 

(2) INFORMATION FOR SEQ ID NO: 36: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2977 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA ( genomic ) 
(iii) HYPOTHETICAL: NO 
(iv) ANTI -SENSE: NO 

25 (vii) IMMEDIATE SOURCE: 

(A) LIBRARY: Human fetal brain cDNA library 

(B) CLONE: GEN-073E07 

(ix) FEATURE: 
30 (A) NAME/KEY: CDS 

(B) LOCATION: 103.. 2532 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 36: 



20 



35 



TAGCAAGTTT GGCGGCTCCA AGCCAGGCGC GOCTCAGGAT CCAGGCTCAT TTGCTTCCAC 60 



CTAGCTTCGG TGCCCCCTGC TAGGCGGGGA OOCTOGAGAG CG ATG COG ATC GAT 114 

Met Pro Met Asp 

40 1 

TTG ATT TTA GTT CTG TOG TTC TGT GTG TGC ACT GOC AGG ACA GTG GTG 162 
Leu He Leu Val Val Trp Phe Cys Val Cys Thr Ala Arg Thr Val Val 
5 10 15 20 

45 

GGC TIT GGG ATG GAC CCT GAC CTT CAG ATG GAT ATC GTC ACC GAG CTT 210 
Gly Phe Gly Met Asp Pro Asp Leu Gin Met Asp He Val Thr Glu Leu 
25 30 35 

50 GAC CTT GTG AAC ACC ACC CTT GGA GTT OCT CAG GTG TCT GGA ATG CAC 258 
Asp Leu Val Asn Thr Thr Leu Gly Val Ala Gin Val Ser Gly Met His 
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40 



45 



50 



10 



15 



20 



25 



30 



35 



40 



45 



50 



AAT GCC AGC AAA GCA TTT TTA TTT CAA GAC ATA GAA AGA GAG ATC CAT 306 
Asn Ala Ser Lys Ala Phe Leu Phe Gin Asp He Glu Arg Glu He His 
55 60 ~ 65 

GCA GCT OCT CAT GTG AGT GAG AAA TTA ATT CAG CTG TTC CAG AAC AAG 354 
Ala Ala Pro His Val Ser Glu Lys Leu He Gin Leu Phe Gin Asn Lys 
70 75 80 

ACT GAA TTC ACC ATT TTC GCC ACT GTA CAG CAG AAG CCA TCC ACT TCA 402 
Ser Glu Phe Thr He Leu Ala Thr Val Gin Gin Lys Pro Ser Thr Ser 
85 90 95 100 

GGA GTG ATA CTG TCC ATT CGA GAA CTG GAG CAC AGC TAT TTT GAA CTG 450 
Gly Val He Leu Ser He Arg Glu Leu Glu His Ser Tyr Phe Glu Leu 
105 HO us 

GAG AGC AGT GGC CTG AGG GAT GAG ATT CGG TAT CAC TAC ATA CAC AAT 498 
Glu Ser Ser Gly Leu Arg Asp Glu He Arg Tyr His Tyr He His Asn 
120 125 130 

GGG AAG CCA AGG ACA GAG GCA CTT CCT TAC CGC ATG GCA GAT GGA CAA 546 
Gly Lys Pro Arg Thr Glu Ala Leu Pro Tyr Arg Met Ala Asp Gly Gin 
135 140 ~ 145 

TGG CAC AAG GTT GCA CTG TCA GTT AGC GCC TCT CAT CTC CTG CTC CAT 594 
Trp His Lys Val Ala Leu Ser Val Ser Ala Ser His Leu Leu Leu His 
150 155 160 

GTC GAC TGT AAC AGG ATT TAT GAG CGT GTG ATA GAC CCT CCA GAT ACC 642 
Val Asp Cys Asn Arg He Tyr Glu Arg Val He Asp Pro Pro Asp Thr 
165 170 175 180 

AAC CTT CCC CCA GGA ATC AAT TTA TGG CTT GGC CAG CGC AAC CAA AAG 690 
Asn Leu Pro Pro Gly He Asn Leu Trp Leu Gly Gin Arg Asn Gin Lys 
185 190 195 

CAT GGC TTA TTC AAA GGG ATC ATC CAA GAT GGG AAG ATC ATC TTT ATG 738 
His Gly Leu Phe Lys Gly He He Gin Asp Gly Lys He He Hie Met 
200 205 210 

COG AAT GGA TAT ATA ACA CAG TGT CCA AAT CTA AAT CAC ACT TGC CCA 786 
Pro Asn Gly Tyr He Thr Gin Cys Pro Asn Leu Asn His Thr Cys Pro 
215 220 225 

ACC TGC AGT GAT TTC TTA AGC CTG GTG CAA GGA ATA ATG GAT TTA CAA 834 
Thr Cys Ser Asp Phe Leu Ser Leu Val Gin Gly He Met Asp Leu Gin 
230 235 240 
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GAG CTT TTG GCC AAG ATG ACT GCA AAA CTA AAT TAT GCA GAG ACA AGA 
Glu Leu Leu Ala Lys Met Thr Ala Lys Leu Asn Tyr Ala Glu Thr Arg 
245 250 255 ~ 260 

CTT AGT CAA 1TG GAA AAC TGT CAT TGT GAG AAG ACT TGT CAA GTG AGT 
Leu Ser Gin Leu Glu Asn Cys His Cys Glu Lys Thr Cys Gin Val Ser 
265 270 275 

GGA CTG CTC TAT CGA GAT CAA GAC TCT TGG CTA GAT GGT GAC CAT TGC 
Gly Leu Leu Tyr Arg Asp Gin Asp Ser Trp Val Asp Gly Asp His Cys 
280 285 & 290 

AGG AAC TGC ACT TGC AAA AGT GGT GOC GTG GAA TGC CGA AGG ATG TCC 
Arg Asn Cys Thr Cys Lys Ser Gly Ala Val Glu Cys Arg Arg Met Ser 
295 300 305 

TGT COC OCT CTC AAT TGC TOC CCA GAC TCC CTC CCA CTA CAC ATT GCT 
Cys Pro Pro Leu Asn Cys Ser Pro Asp Ser Leu Pro Val His He Ala 
310 315 320 

GGC CAG TGC TGT AAG GTC TGC CGA CCA AAA TCT ATC TAT GGA GGA AAA 
Gly Gin Cys Cys Lys Val Cys Arg Pro Lys Cys He Tyr Gly Gly Lys 
325 330 335 340 

GTT CTT GCA GAA GGC CAG CGG ATT TTA ACC AAG AGC TGT CGG GAA TGC 
Val Leu Ala Glu Gly Gin Arg He Leu Thr Lys Ser Cys Arg Glu Cys 
345 350 355 

CGA GCT GGA GIT TTA CTA AAA ATT ACA GAA ATG TCT OCT CCT TTG AAC 
Arg Gly Gly Val Leu Val Lys He Thr Glu Met Cys Pro Pro Leu Asn 
360 365 370 

TGC TCA GAA AAG GAT CAC ATT CTT CCT GAG AAT CAG TGC TGC CCT GTC 
Cys Ser Glu Lys Asp His He Leu Pro Glu Asn Gin Cys Cys Arg Val 
375 380 385 

TCT AGA GCT CAT AAC TTT TCT GCA GAA GGA CCT AAA TGT GGT GAA AAC 
Cys Arg Gly His Asn Phe Cys Ala Glu Gly Pro Lys Cys Gly Glu Asn 
390 395 400 

TCA GAG TGC AAA AAC TGG AAT ACA AAA GCT ACT TCT GAG TGC AAG ACT 
Ser Glu Cys Lys Asn Trp Asn Thr Lys Ala Thr Cys Glu Cys Lys Ser 
405 410 415 420 

GCT TAC ATC TCT GTC CAG GGA GAC TCT GOC TAC TCT GAA GAT ATT GAT 
Gly Tyr He Ser Val Gin Gly Asp Ser Ala Tyr Cys Glu Asp He Asp 
425 430 435 

GAG TCT GCA GCT AAG ATG CAT TAC TCT CAT GCC AAT ACT GTG TCT GTC 
Glu Cys Ala Ala Lys Met His Tyr Cys His Ala Asn Thr Val Cys Val 
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440 445 450 

AAC CTT OCT GGG TTA TAT CGC TGT GAC TGT GTC OCA GGA TAG ATT CGT 1506 
5 Asn Leu Pro Gly Leu Tyr Arg Cys Asp Cys Val Pro Gly Tyr He Arg 
455 460 465 

GTG GAT GAC TTC TCT TGT ACA GAA CAC GAT GAA TGT GGC AGC GGC CAG 1554 
Val Asp Asp Phe Ser Cys Thr Glu His Asp Glu Cys Gly Ser Gly Gin 
10 470 475 ^ 480 

CAC AAC TGT GAT GAG AAT GOC ATC TGC ACC AAC ACT GTC CAG GGA CAC 1602 
His Asn Cys Asp Glu Asn Ala He Cys Thr Asn Thr Val Gin Gly His 
485 490 495 500 

15 

AGC TGC ACC TGC AAA COG GGC TAC GTG GGG AAC GGG ACC ATC TGC AGA 1650 
Ser Cys Thr Cys Lys Pro Gly Tyr Val Gly Asn Gly Thr He Cys Arg 
505 510 515 

20 GOT TTC TGT GAA GAG GGC TGC AGA TAC GGT GGA ACG TGT GTG OCT CCC 1698 
Ala Phe Cys Glu Glu Gly Cys Arg Tyr Gly Gly Thr Cys Val Ala Pro 
520 525 530 

AAC AAA TGT GTC TGT CCA TCT GGA TTC ACA GGA AGC CAC TGC GAG AAA 1746 
25 Asn Lys Cys Val Cys Pro Ser Gly Phe Thr Gly Ser His Cys Glu Lys 
535 540 545 

GAT ATT GAT GAA TGT TCA GAG GGA ATC ATT GAG TGC CAC AAC CAT TGC 1794 
Asp He Asp Glu Cys Ser Glu Gly He He Glu Cys His Asn His Ser 
30 550 555 560 

OGC TGC GTT AAC CTG CCA GGG TGG TAC CAC TGT GAG TGC AGA AGC GGT 1842 
Arg Cys Val Asn Leu Pro Gly Trp Tyr His Cys Glu Cys Arg Ser Gly 
565 570 575 ~ ~ 580 

TTC CAT GAC GAT GGG ACC TAT TCA CTG TCC GGG GAG TCC TGT ATT GAC 1890 
Phe His Asp Asp Gly Thr Tyr Ser Leu Ser Gly Glu Ser Cys He Asp 
585 590 595 

ATT GAT GAA TGT GCC TTA AGA ACT CAC ACC TGT TGG AAC GAT TCT GOC 1938 
He Asp Glu Cys Ala Leu Arg Thr His Thr Cys Trp Asn Asp Ser Ala 
600 605 610 

TGC ATC AAC CTG GCA GGG GGT TIT GAC TGT CTC TGC CCC TCT GGG CCC 1986 
45 Qys He Asn Leu Ala Gly Gly Phe Asp Cys Leu Cys Pro Ser Gly Pro 
615 620 625 

TCC TGC TCT GGT GAC TGT OCT CAT GAA GGG GGG CTG AAG CAC AAT GGC 2034 
Ser Cys Ser Gly Asp Cys Pro His Glu Gly Gly Leu Lys His Asn Gly 
50 630 635 640 
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CAG GTG TGG AOC TTG AAA GAA GAC AGG TGT TCT GTC TGC TCC TGC AAG 2082 
Gin Val Trp Thr Leu Lys Glu Asp Arg Cys Ser Val Cys Ser Cys Lys 
645 650 " 655 660 

GAT GGC AAG ATA TTC TGC OGA CGG ACA GCT TGT GAT TGC CAG AAT CCA 2130 
Asp Gly Lys He Hie Cys Arg Arg Thr Ala Cys Asp Cys Gin Asn Pro 
665 670 675 

AGT GCT GAC CTA TTC TGT TGC CCA GAA TGT GAC ACC AGA GTC ACA AGT 2178 
Ser Ala Asp Leu Phe Cys Cys Pro Glu Cys Asp Thr Arg Val Thr Ser 
680 685 690 

CAA TGT TTA GAC CAA AAT GGT CAC AAG CTG TAT CGA AGT GGA GAC AAT 2226 
15 Gin Cys Leu Asp Gin Asn Gly His Lys Leu Tyr Arg Ser Gly Asp Asn 
695 700 705 

TGG ACC CAT AGC TGT CAG CAG TCT CGG TGT CTG GAA GGA GAG GTA GAT 2274 
Trp Thr His Ser Cys Gin Gin Cys Arg Cys Leu Glu Gly Glu Val Asp 
20 710 715 720 

TGC TGG CCA CTC ACT TGC COC AAC TTG AGC TGT GAG TAT ACA GCT ATC 2322 
Cys Trp Pro Leu Thr Cys Pro Asn Leu Ser Cys Glu Tyr Thr Ala He 
725 730 735 740 

TTA GAA GGG GAA TGT TGT CCC CGC TGT GTC ACT GAC COC TGC CTA GCT 2370 
Leu Glu Gly Glu Cys Cys Pro Arg Cys Val Ser Asp Pro Cys Leu Ala 
745 750 755 

GAT AAC ATC AOC TAT GAC ATC AGA AAA ACT TGC CTG GAC AGC TAT GCT 2418 
Asp Asn He Thr Tyr Asp He Arg Lys Thr Cys Leu Asp Ser Tyr Gly 
760 765 770 

CTT TCA CGG CTT ACT GGC TCA GTG TGG AGG ATG GCT GGA TCT CCC TGC 2466 
Val Ser Arg Leu Ser Gly Ser Val Trp Thr Met Ala Gly Ser Pro Cys 
775 780 785 

ACA ACC TCT AAA TGC AAG AAT GGA AGA GTC TGT TGT TCT GTG GAT TTT 2514 
Thr Thr Cys Lys Cys Lys Asn Gly Arg Val Cys Cys Ser Val Asp Phe 
790 795 800 

GAG TCT CTT CAA AAT AAT TGAAGTATTT ACACTGGACT CAAOGCAGAA 2562 
Glu Cys Leu Gin Asn Asn 
805 810 

45 GAATGGAOGA AATGACCATC CAACGTGATT AAGGATAGGA ATCGGTAGTT TGGTTTTTTT 2622 

GTTTGTTTTG TTTTTTTAAC CACAGATAAT TGCCAAAGTT TCCACCTGAG GACGGTGTTT 2682 

CGGAGGTTGC CITITGGACC TACCACTTTG CTCATTCTTG CTAACCTAGT CTAGGTGACC 2742 

50 
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30 



35 



40 



50 



TACAGTGOOG TGCATTTAAG TCAATGGTTG TTAAAAGAAG TTTOOOGTGT TGTAAATCAT 2802 

GTTTCCCTTA TCAGATCATT TGCAAATACA TTTAAATGAT CTCATGGTAA ATQGTTGATG 2862 

TATTTTTTGG GTTTATTTTG TGTACTAAOC ATAATAGAGA GAGACTCAGC TCCTTTTATT 2922 

TATTTTCITG ATTTATGGAT CAAATTCTAA AATAAAGTTG CCTGTTGTGA CTTTT 2977 

(2) INFORMATION FOR SEQ ID NO: 37: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 816 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:37: 



Met Glu Ser Arg Val Leu Leu Arg Thr Phe Cys Leu He Phe Gly Leu 
1 5 ^10 15 

Gly Ala Val Trp Gly Leu Gly Val Asp Pro Ser Leu Gin He Asp Val 
20 25 30 

Leu Thr Glu Leu Glu Leu Gly Glu Ser Thr Thr Gly Val Arg Gin Val 
35 40 45 

Pro Gly Leu His Asn Gly Thr Lys Ala Phe Leu Phe Gin Asp Thr Pro 
50 55 60 

Arg Ser He Lys Ala Ser Thr Ala Thr Ala Glu Gin Phe Phe Gin Lys 
65 70 75 80 

Leu Arg Asn Lys His Glu Phe Thr He Leu Val Thr Leu Lys Gin Thr 
85 90 95 

His Leu Asn Ser Gly Val He Leu Ser He His His Leu Asp His Arg 
100 105 110 

45 Tyr Leu Glu Leu Glu Ser Ser Gly His Arg Asn Glu Val Arg Leu His 
115 120 125 

Tyr Arg Ser Gly Ser His Arg Pro His Thr Glu Val Phe Pro Tyr He 
130 135 140 



Leu Ala Asp Asp Lys Trp His Lys Leu Ser Leu Ala He Ser Ala Ser 
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145 



150 



155 



160 



His Leu lie Leu His lie Asp Cys Asn Lys lie Tyr Glu Arg Val Val 
165 170 175 

Glu Lys Pro Ser Thr Asp Leu Pro Leu Gly Thr Thr Phe Trp Leu Gly 
180 185 190 

Gin Arg Asn Asn Ala His Gly Tyr Phe Lys Gly He Met Gin Asp Val 
195 200 205 

Gin Leu Leu Val Met Pro Gin Gly Phe He Ala Gin Cys Pro Asp Leu 
210 215 220 

Asn Arg Thr Cys Pro Thr Cys Asn Asp Phe His Gly Leu Val Gin Lys 
225 " 230 235 240 

He Met Glu Leu Gin Asp He Leu Ala Lys Thr Ser Ala Lys Leu Ser 
245 250 255 

Arg Ala Glu Gin Arg Met Asn Arg Leu Asp Gin Cys Tyr Cys Glu Arg 
260 265 270 

Thr Cys Thr Mat Lys Gly Thr Thr Tyr Arg Glu Phe Glu Ser Trp He 
275 280 285 

Asp Gly Cys Lys Asn Cys Thr Cys Leu Asn Gly Thr He Gin Cys Glu 
290 295 300 

Thr Leu He Cys Pro Asn Pro Asp Cys Pro Leu Lys Ser Ala Leu Ala 
305 310 315 320 

Tyr Val Asp Gly Lys Cys Cys Lys Glu Cys Lys Ser He Cys Gin Phe 
325 330 335 

Gin Gly Arg Thr Tyr Phe Glu Gly Glu Arg Asn Thr Val Tyr Ser Ser 
340 345 350 

Ser Gly Val Cys Val Leu Tyr Glu Cys Lys Asp Gin Thr Met Lys Leu 
355 360 365 

Val Glu Ser Ser Gly Cys Pro Ala Leu Asp Cys Pro Glu Ser His Gin 
370 375 380 

He Thr Leu Ser His Ser Cys Cys Lys Val Cys Lys Gly Tyr Asp Phe 
385 390 395 400 



Cys Ser Glu Arg His Asn Cys Met Glu Asn Ser He Cys Arg Asn Leu 
405 410 415 
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Asn Asp Arg Ala Val Cys Ser Cys Arg Asp Gly Phe Arg Ala Leu Arg 
420 " 425 430 

Glu Asp Asn Ala Tyr Cys Glu Asp lie Asp Glu Cys Ala Glu Gly Arg 
435 440 445 

His Tyr Cys Arg Glu Asn Thr Met Cys Val Asn Thr Pro Gly Ser Phe 
450 455 " 460 

Met Cys lie Cys Lys Thr Gly Tyr lie Arg lie Asp Asp Tyr Ser Cys 
465 470 475 480 

Thr Glu His Asp Glu Cys lie Thr Asn Gin His Asn Cys Asp Glu Asn 
485 490 495 

Ala Leu Cys Phe Asn Thr Val Gly Gly His Asn Cys Val Cys Lys Pro 
500 505 510 

Gly Tyr Thr Gly Asn Gly Thr Thr Cys Lys Ala Phe Cys Lys Asp Gly 
515 520 525 

Cys Arg Asn Gly Gly Ala Cys lie Ala Ala Asn Val Cys Ala Cys Pro 
530 " 535 540 

Gin Gly Phe Thr Gly Pro Ser Cys Glu Thr Asp lie Asp Glu Cys Ser 
545 550 555 560 

Asp Gly Phe Val Gin Cys Asp Ser Arg Ala Asn Cys lie Asn Leu Pro 
565 570 575 

Gly Trp Tyr His Cys Glu Cys Arg Asp Gly Tyr His Asp Asn Gly Met 
580 585 " 590 

Phe Ser Pro Ser Gly Glu Ser Cys Glu Asp lie Asp Glu Cys Gly Thr 
595 600 605 

Gly Arg His Ser Cys Ala Asn Asp Thr lie Cys Phe Asn Leu Asp Gly 
610 615 620 

Gly Tyr Asp Cys Arg Cys Pro His Gly Lys Asn Cys Thr Gly Asp Cys 
625 630 635 640 

lie His Asp Gly Lys Val Lys His Asn Gly Gin lie Trp Val Leu Glu 
645 650 655 

Asn Asp Arg Cys Ser Val Cys Ser Cys Gin Asn Gly Phe Val Met Cys 
660 665 670 



Arg Arg Met Val Cys Asp Cys Glu Asn Pro Thr Val Asp Leu Phe Cys 
675 680 685 



106 



EP0 796 913 A2 



Cys Pro Glu Cys Asp Pro Arg Leu Ser Ser Gin Cys Leu His Gin Asn 
690 695 700 

Gly Glu Thr Leu Tyr Asn Ser Gly Asp Thr Trp Val Gin Asn Cys Gin 
705 710 ^ 715 * 720 

Gin Cys Arg Cys Leu Gin Gly Glu Val Asp Cys Trp Pro Leu Pro Cys 
725 730 735 

Pro Asp Val Glu Cys Glu Phe Ser lie Leu Pro Glu Asn Glu Cys Cys 
740 745 750 

Pro Arg Cys Val Thr Asp Pro Cys Gin Ala Asp Thr lie Arg Asn Asp 
755 760 765 

lie Thr Lys Thr Cys Leu Asp Glu Met Asn Val Val Arg Phe Thr Gly 
770 775 780 

Ser Ser Trp lie Lys His Gly Thr Glu Cys Thr Leu Cys Gin Cys Lys 
785 790 795 800 

Asn Gly His lie Cys Cys Ser Val Asp Pro Gin Cys Leu Gin Glu Leu 
805 " 810 * 815 



(2) INFORMATION FOR SEQ ID NO: 38: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2448 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA( genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 38: 



ATGGAGTCTC GGGTCTTACT GAGAACATTC TGTTTGATCT TOGGTCTOGG AGCAGTTTGG 
GGGCTTGGTG TGGACCCTTC OCTACAGATT GACGTCTTAA CAGAGTTAGA ACTTGGGGAG 
TCCACGACCG GAGTGCGTCA GGTO00GGGG CTGCATAATG GGACGAAAGC CTTTCTCTTT 
CAAGATACTC CCAGAAGCAT AAAAGCATCC ACTGCTACAG CTGAACAGTT TTTTCAGAAG 
CTGAGAAATA AACATGAATT TACTAITTTG GTGACCCTAA AACAGACCCA CTTAAATTCA 
GGAGTTATTC TCTCAATTCA CCACTTGGAT CACAGGTACC TGGAACTGGA AACTAGTGGC 
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20 



CATOGGAATG 


AAGTCAGACT 


GCATTAOOGC 


TCAGGCAGTC 


ACOGOOCTCA 


CACAGAAGTG 


420 


TTTOCTTACA 


TTTTGGCTGA 


TGACAACTGG 


CACAAGCTCT 


OCTTAGCCAT 


CAGTGCTTCC 


480 


CATTTGATTT 


TACACATTGA 


CTGCAATAAA 


ATTTATGAAA 


GGGTAGTAGA 


AAAGCCCTCC 


540 


ACAGACTTGC 


CTCTAGGCAC 


AACATTTTGG 


CTAGGACAGA 


GAAATAATGC 


GCATGGATAT 


600 


TTTAAGGGTA 


TAATGCAAGA 


TGTOCAATTA 


CTTGTCATGC 


CXXAGGGATT 


TATTGCTCAG 


660 


TGCXDCAGATC 


TTAATCGCAC 


CTGTOCAACT 


TGCAATGACT 


TOCATGGACT 


TGTGCAGAAA 


720 


ATCATGGAGC 


TACAGGATAT 


TTTAGOCAAA 


ACATCAGOCA 


AGCTGTCTOG 


AGCTGAACAG 


780 


CGAATGAATA 


GATTGGATCA 


GTGCTATTGT 


GAAAGGACTT 


GCAOCATGAA 


GGGAACCACC 


840 


TACCGAGAAT 


TTGAGTCCTG 


GATAGAOGGC 


TGTAAGAACT 


GCACATGOCT 


GAATGGAACC 


900 


ATCCACTGTG 


AAACTCTAAT 


CTGOOCAAAT 


CCTGACTGOC 


CACTTAAGTC 


GGCTCTTGCG 


960 


TATOTGGATG 


GCAAATGCTG 


TAAGGAATGC 


AAATOGATAT 


GCCAATTTCA 


AGGACGAACC 


1020 


TACTTTGAAG 


GAGAAAGAAA 


TACACTCTAT 


TOCTCTTCTG 


GACTATGTGT 


TCTCTATGAG 


1080 


TGCAAGGACC 


AGACCATGAA 


AC1TUTTGAG 


AGTTCAGGCT 


GTCCAGCTTT 


GGATTOTOCA 


1140 


GAGTCTCATC 


AGATAAOCTT 


GTCTCACAGC 


TGTTGCAAAG 


TTTGTAAAGG 


TTATGACTTT 


1200 


TGTTCTGAAA 


GGCATAACTG 


CATGGAGAAT 


TOCATCTGCA 


GAAATCTGAA 


TGAJCAGGGCT 


1260 


GTTTGTAGCT 


GTCGAGATGG 


TTTTAGGGCT 


CTTOGAGAGG 


ATAATGOCTA 


CTGTGAAGAC 


1320 


ATCGATGACT 


GTGCTGAAGG 


GGGOCATTAC 


TGTOGTGAAA 


ATACAATGTG 


TGTCAACAOC 


1380 




TTATGTGCAT 


CTGCAAAACT 


GGATACATCA 


GAATTGATGA 


TTATTCATGT 


1440 


ACAGAACATG 


ATGAGTGTAT 


CACAAATCAG 


CACAACTCTG 


ATGAAAATGC 


TTTATGCTTC 


1500 


AACACTGTTG 


GAGGACACAA 


CTGTCTTTGC 


AAGOOGGGCT 


ATACAGGGAA 


TGGAAOGACA 


1560 


TGCAAAGCAT 


TTTGCAAAGA 


TGGCTGTAGG 


AATGGAGGAG 


CCPCTATTGC 


OGCTAATGTG 


1620 




PAPA Af2f3PTT 




AAA* 1 vjr luAm 


UjyntAl Ion 




1DOU 


GATGGTTTTG 


TTCAATGTGA 


CAGTOGTGCT 


AATTGCATTA 


AOJraXTGG 


ATQGTACCAC 


1740 


TGTGAGTGCA 


GAGATGGCTA 


OCATGACAAT 


GGGATGTTTT 


CAOCAAGTGG 


AGAATOGTGT 


1800 


GAAGATATTG 


ATGAGTGTGG 


GACCGGGAGG 


CACAGCTGTG 


(XAATGATAC 


CATTTGCTTC 


1860 
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AATTTGGATG GCGGATATGA TTGTOGATGT CCTCATGGAA AGAATTGCAC AGGGGACTGC 1920 

ATOCATGATG GAAAAGTTAA GCACAATGGT CAGATTTQQG TGTTGGAAAA TGACAGGTGC 1980 

TCTGrTCTGCT CATGTCAGAA TGGATTCCTT ATGTGTCGAC GGATGGTCTG TGACTGTGAG 2040 

AATOOCACAG TTCATCTTTT TTGCTGCCCT GAATGTGAOC CAAGGCTTAG TAGTCAGTGC 2100 

CTOCATCAAA ATGGGGAAAC TTTCTATAAC AGTGGTGACA (XTGGGTCCA GAATTGTCAA 2160 

CAGTGCCGCT GCTTGCAAGG GGAAGTTGAT TGTTQGOCCC TGCCTTGCOC AGATGTGGAG 2220 

TGTGAATTCA GC^TTCTOOC AGAGAATGAG TGCTGOOOGC GCTGTGTCAC AGAOCCTTGC 2280 

CAGGCTGACA OCATOOGCAA TGACATCAOC AAGACTTGOC TOGADGAAAT GAATCTGGTT 2340 

OGCTTCAOCG GGTOCTCTTG GATCAAACAT GGCACTGAGT GTACTCTCTG OCAGTGCAAG 2400 

20 AATGGOCACA TCTCTTGCTC AGTGGATCCA CAGTGCCTTC AGGAACTG 2448 

(2) INFORMATION FOR SEQ ID NO: 39: 

25 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3198 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

30 

(ii) MOLECULE TYPE: DNA( genomic) 
(iii) HYPOTHETICAL: NO 
35 (iv) ANTI-SENSE: NO 

(vii) IMMEDIATE SOURCE: 

(A) LIBRARY: Human fetal brain cDNA library 

(B) CLONE: GEN-093E05 

40 

(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 97.. 2544 

45 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 39: 

TTGGGAGGAG CAGTCTCTOC GCTCGTCTCC CGGAGCTTTC TCXATTGTCT CTGCCTTTAC 60 

50 AACAGAGGGA GACGATGGAC TGAGCTGATC CGCACC ATG GAG TCT CGG GTC TTA 114 

Met Glu Ser Arg Val Leu 
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CTG AGA ACA TTC TGT TTG ATC TTC GCT CTC GGA GCA GTT TGG GGG CTT 
Leu Arg Thr Phe Cys Leu He Phe Gly Leu Gly Ala Val Trp Gly Leu 
10 15 20 



162 



10 



GGT GTG GAC OCT TCC CTA CAG ATT GAC GTC TTA ACA GAG TTA GAA CTT 
Gly Val Asp Pro Ser Leu Gin He Asp Val Leu Thr Glu Leu Glu Leu 
25 30 35 



210 



75 



GGG GAG TOC ACG AOC GGA GTG OCT CAG GTC COG GGG CTG CAT AAT GGG 258 
Gly Glu Ser Thr Thr Gly Val Arg Gin Val Pro Gly Leu His Asn Gly 
40 45 50 

ACG AAA G0C TTT CTC TIT CAA GAT ACT COC AGA AGO ATA AAA GCA TOC 306 
Thr Lys Ala Phe Leu Phe Gin Asp Thr Pro Arg Ser He Lys Ala Ser 
55 60 65 70 



20 ACT GCT ACA GCT GAA CAG TTT TTT CAG AAG CTG AGA AAT AAA CAT GAA 
Thr Ala Thr Ala Glu Gin Phe Phe Gin Lys Leu Arg Asn Lys His Glu 
75 80 ~ " 85 



354 



TTT ACT ATT TTG GTG ADC CTA AAA CAG ADC CAD TTA AAT TCA GGA GTT 
25 Phe Thr He Leu Val Thr Leu Lys Gin Thr His Leu Asn Ser Gly Val 

90 95 100 



402 



30 



ATT CTC TCA ATT CAC CAC TTG GAT CAC AGG TAC CTG GAA CTG GAA ACT 
He Leu Ser He His His Leu Asp His Arg Tyr Leu Glu Leu Glu Ser 
105 110 115 



450 



35 



ACT GGC CAT OGG AAT GAA GTC AGA CTG CAT TAC CGC TCA GGC ACT CAC 498 
Ser Gly His Arg Asn Glu Val Arg Leu His Tyr Arg Ser Gly Ser His 
120 125 130 

CGC OCT CAC ACA GAA GTG TTT COT TAC ATT TTG GCT GAT GAC AAG TGG 546 
Arg Pro His Thr Glu Val Phe Pro Tyr He Leu Ala Asp Asp Lys Trp 
135 140 145 150 



40 



CAC AAG CTC TCC TTA GCC ATC ACT GCT TCC CAT TTG ATT TTA CAC ATT 
His Lys Leu Ser Leu Ala He Ser Ala Ser His Leu He Leu His He 
155 160 165 



594 



45 



GAC TGC AAT AAA ATT TAT GAA AGG CTA CTA GAA AAG 00C TOC ACA GAC 
Asp Cys Asn Lys He Tyr Glu Arg Val Val Glu Lys Pro Ser Thr Asp 
170 175 180 



642 



50 



TTG OCT CTA GGC ACA ACA TTT TGG CTA GGA CAG AGA AAT AAT GOG CAT 
Leu Pro Leu Gly Thr Thr Phe Trp Leu Gly Gin Arg Asn Asn Ala His 
185 190 195 
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55 



110 



EP0 796 913 A2 



10 



GGA TAT TTT AAG GGT ATA ATG CAA GAT GTC CAA TTA CTT GTC ATG CCC 738 
Gly Tyr Phe Lys Gly He Met Gin Asp Val Gin Leu Leu Val Met Pro 
200 205 210 

CAG GGA TTT ATT GCT CAG TGC CCA GAT CTT AAT CGC AOC TGT CCA ACT 786 
Gin Gly Phe He Ala Gin Cys Pro Asp Leu Asn Arg Thr Cys Pro Thr 
215 220 225 230 

TGC AAT GAC TTC CAT GGA CTT GTG CAG AAA ATC ATG GAG CTA CAG GAT 834 
Cys Asn Asp Phe His Gly Leu Val Gin Lys He Met Glu Leu Gin Asp 
235 240 245 

ATT TTA GCC AAA ACA TCA GOC AAG CTG TCT GGA GCT GAA CAG CGA ATG 882 
15 He Leu Ala Lys Thr Ser Ala Lys Leu Ser Arg Ala Glu Gin Arg Met 
250 255 260 

AAT AGA TTG GAT CAG TGC TAT TGT GAA AGG ACT TGC AOC ATG AAG GGA 930 
Asn Arg Leu Asp Gin Cys Tyr Cys Glu Arg Thr Cys Thr Met Lys Gly 
20 265 270 275 

ACC AOC TAC CGA GAA TTT GAG TCC TGG ATA GAC GGC TGT AAG AAC TGC 978 
Thr Thr Tyr Arg Glu Phe Glu Ser Trp He Asp Gly Cys Lys Asn Cys 
280 285 290 

^ ACA TGC CTG AAT GGA ADC ATC CAG TCT GAA ACT CTA ATC TGC CCA AAT 1026 

Thr Cys Leu Asn Gly Thr He Gin Cys Glu Thr Leu He Cys Pro Asn 
295 300 305 310 

OCT GAC TGC CCA CTT AAG TOG GCT CTT GOG TAT GTG GAT GGC AAA TGC 1074 
Pro Asp Cys Pro Leu Lys Ser Ala Leu Ala Tyr Val Asp Gly Lys Cys 
315 320 325 

TGT AAG GAA TGC AAA TOG ATA TGC CAA TTT CAA GGA CGA AOC TAC TTT 1122 
Cys Lys Glu Cys Lys Ser He Cys Gin Phe Gin Gly Arg Thr Tyr Phe 
330 335 340 

GAA GGA GAA AGA AAT ACA GTC TAT TOO TCT TCT GGA GTA TGT CTT CTC 1170 
Glu Gly Glu Arg Asn Thr Val Tyr Ser Ser Ser Gly Val Cys Val Leu> 
345 350 355 

TAT GAG TGC AAG GAC CAG ACC ATG AAA CTT GTT GAG ACT TCA GGC TCT 1218 
Tyr Glu Cys Lys Asp Gin Thr Met Lys Leu Val Glu Ser Ser Gly Cys 
360 365 370 

45 OCA GCT TTG GAT TGT CCA GAG TCT CAT CAG ATA AOC TTG TCT CAC AGO 1266 

Pro Ala Leu Asp Cys Pro Glu Ser His Gin He Thr Leu Ser His Ser 
375 380 385 390 

TCT TGC AAA GTT TCT AAA GGT TAT GAC TTT TGT TCT GAA AGG CAT AAC 1314 
50 Cys Cys Lys Val Cys Lys Gly Tyr Asp Phe Cys Ser Glu Arg His Asn 
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395 



400 



405 



TGC ATG GAG AAT TOC ATC TGC AGA AAT CTG AAT GAC AGG GCT GTT TGT 
Cys Met Glu Asn Ser He Cys Arg Asn Leu Asn Asp Arg Ala Val Cys 
410 415 420 

AGC TCT OGA GAT GGT TTT AGG GCT CTT CGA GAG GAT AAT GOC TAG TGT 
Ser Cys Arg Asp Gly Phe Arg Ala Leu Arg Glu Asp Asn Ala Tyr Cys 
425 430 435 

GAA GAC ATC GAT GAG TGT GCT GAA GGG CGC CAT TAC TGT CGT GAA AAT 
Glu Asp He Asp Glu Cys Ala Glu Gly Arg His Tyr Cys Arg Glu Asn 
440 445 450 

ACA ATG TCT GTC AAC ADC COG GGT TCT TTT ATG TGC ATC TGC AAA ACT 
Thr Met Cys Val Asn Thr Pro Gly Ser Phe Met Cys He Cys Lys Thr 
455 460 465 470 

GGA TAC ATC AGA ATT GAT GAT TAT TCA TGT ACA GAA CAT GAT GAG TGT 
Gly Tyr He Arg He Asp Asp Tyr Ser Cys Thr Glu His Asp Glu Cys 
475 480 485 

ATC ACA AAT CAG CAC AAC TOT GAT GAA AAT GCT TTA TGC TTC AAC ACT 
He Thr Asn Gin His Asn Cys Asp Glu Asn Ala Leu Cys Phe Asn Thr 
490 495 500 

GTT GGA GGA CAC AAC TGT GTT TGC AAG COG GGC TAT ACA GGG AAT GGA 
Val Gly Gly His Asn Cys Val Cys Lys Pro Gly Tyr Thr Gly Asn Gly 
505 510 515 

ADG ACA TGC AAA GCA TTT TGC AAA GAT GGC TGT AGG AAT GGA GGA GCC 
Thr Thr Cys Lys Ala Phe Cys Lys Asp Gly Cys Arg Asn Gly Gly Ala 
520 525 530 

TGT ATT GCC GCT AAT GTG TGT GCC TGC CCA CAA GGC TTC ACT GGA CCC 
Cys He Ala Ala Asn Val Cys Ala Cys Pro Gin Gly Phe Thr Gly Pro 
535 540 545 550 

AGC TGT GAA ADG GAC ATT GAT GAA TGC TCT GAT GCT TTT GTT CAA TGT 
Ser Cys Glu Thr Asp He Asp Glu Cys Ser Asp Gly Phe Val Gin Cys 
555 560 565 

GAC ACT CGT GCT AAT TGC ATT AAC CTG OCT GGA TGG TAC CAC TGT GAG 
Asp Ser Arg Ala Asn Cys He Asn Leu Pro Gly Trp Tyr His Cys Glu 
570 575 580 



TGC AGA GAT GGC TAC CAT GAC AAT GGG ATG TTT TCA CCA ACT GGA GAA 
Cys Arg Asp Gly Tyr His Asp Asn Gly Met Hie Ser Pro Ser Gly Glu 
585 590 595 
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795 800 805 

TCA GTG GAT OCA CAG TGC CTT CAG GAA CTG TGAAGTTAAC TGTCTCATGG 2564 
Ser Val Asp Pro Gin Cys Leu Gin Glu Leu 
810 ~ 815 

GAGATTTCTG TTAAAAGAAT GTTCTTTCAT TAAAAGACCA AAAAGAAGTT AAAACTTAAA 2624 

TTGGGTGAIT TCIGGGCAGC TAAATGCAGC TTTGTTAATA GCTGAGTGAA CTTTCAATTA 2684 

TGAAATTTCT GGAGCTTGAC AAAATCACAA AAGGAAAATT ACTGGGGCAA AATTAGAOCT 2744 

CAAGTCTGOC TCTACTGTGT CTCACATCAC CATGTAGAAG AATGGGOGTA CAGTATATAC 2804 

CGTGACATOC TGAAOOCTGG ATAGAAAGOC TGAGOOCATT GGATCTGTGA AAQOCTCTAG 2864 

CTTCACTGCT GCAGAAAA1T TTOCTCTAGA TCAGAATCTT CAGAATCAGT TAGGTTOCTC 2924 

20 ACTGCAAGAA ATAAAATGTC AGGCAGTGAA TGAATTATAT TTTCAGAAGT AAAGCAAAGA 2984 

AGCTATAACA TGTTATGTAC AGTACACTCT GAAAAGAAAT CTGAAACAAG TTATTGTAAT 3044 

GATAAAAATA ATGCACAGGC ATGGTTACTT AATATTTTCT AACAGGAAAA CTCATGOCTA 3104 

rrrocTTOrr ttactgcact taatattatt tggttgaait tcttcagtat aagctogttc 3164 

TTCTGCAAAA TTAAATAAAT ATTTCTCTTA OC1T 3198 



15 



25 



30 



40 



45 



(2) INFORMATION FOR SEQ ID NO: 40: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 499 amino acids 
35 (B) TYPE: amino acid 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 40: 



50 



Met Glu Leu Ser Glu Pro Val Val Glu Asn Gly Glu Val Glu Met Ala 
15 10 15 

Leu Glu Glu Ser Trp Glu His Ser Lys Glu Val Ser Glu Ala Glu Pro 
20 25 30 

Gly Gly Gly Ser Ser Gly Asp Ser Gly Pro Pro Glu Glu Ser Gly Gin 
35 40 45 



55 
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Glu Met Met Glu Glu Lys Glu Glu He Arg Lys Ser Lys Ser Val He 
50 55 60 

Val Pro Ser Gly Ala Pro Lys Lys Glu His Val Asn Val Val Phe He 
65 70 75 80 

Gly His Val Asp Ala Gly Lys Ser Thr He Gly Gly Gin He Met Phe 
85 90 95 

Leu Thr Gly Met Ala Asp Lys Arg Thr Leu Glu Lys Tyr Glu Arg Glu 
100 105 110 

Ala Glu Glu Lys Asn Arg Glu Thr Trp Tyr Leu Ser Trp Ala Leu Asp 
115 ' 120 125 

Thr Asn Gin Glu Glu Arg Asp Lys Gly Lys Thr Val Glu Val Gly Arg 
130 135 140 

Ala Tyr Phe Glu Thr Glu Arg Lys His Phe Thr He Leu Asp Ala Pro 
145 150 155 160 

Gly His Lys Ser Phe Val Pro Asn Met He Gly Gly Ala Ser Gin Ala 
165 170 175 

Asp Leu Ala Val Leu Val He Ser Ala Arg Lys Gly Glu Phe Glu Thr 
180 185 " 190 

Gly Phe Glu Lys Gly Gly Gin Thr Arg Glu His Ala Met Phe Gly Lys 
195 200 " 205 

Thr Ala Gly Val Lys His Leu He Val Leu He Asn Lys Met Asp Asp 
210 215 220 

Pro Thr Val Asn Trp Gly He Glu Arg Tyr Glu Glu Cys Lys Glu Lys 
225 230 235 240 

Leu Val Pro Phe Leu Lys Lys Val Gly Phe Ser Pro Lys Lys Asp He 
245 250 255 

His Phe Met Pro Cys Ser Gly Leu Thr Gly Ala Asn He Lys Glu Gin 
260 265 270 

Ser Asp Phe Cys Pro Trp Tyr Thr Gly Leu Pro Phe He Pro Tyr Leu 
275 * 280 285 

Asn Asn Leu Pro Asn Phe Asn Arg Ser He Asp Gly Pro He Arg Leu 
290 295 300 



Pro He Val Asp Lys Tyr Lys Asp Met Gly Thr Val Val Leu Gly Lys 
305 310 315 320 
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Leu Glu Ser Gly Ser He Phe Lys Gly Gin Gin Leu Val Met Met Pro 
325 330 335 

Asn Lys His Asn Val Glu Val Leu Gly He Leu Ser Asp Asp Thr Glu 
340 345 350 

Thr Asp Phe Val Ala Pro Gly Glu Asn Leu Lys He Arg Leu Lys Gly 
355 360 " 365 

He Glu Glu Glu Glu He Leu Pro Glu Phe He Leu Cys Asp Pro Ser 
370 375 380 

Asn Leu Cys His Ser Gly Arg Thr Phe Asp Val Gin He Val He He 
385 390 395 400 

Glu His Lys Ser He He Cys Pro Gly Tyr Asn Ala Val Leu His He 
405 410 415 

His Thr Cys He Glu Glu Val Glu He Thr Ala Leu He Ser Leu Val 
420 425 430 

Asp Lys Lys Ser Gly Glu Lys Ser Lys Thr Arg Pro Arg Phe Val Lys 
435 440 445 

Gin Asp Gin Val Cys He Ala Arg Leu Arg Thr Ala Gly Thr He Cys 
450 455 ' 460 

Leu Glu Thr Phe Lys Asp Phe Pro Gin Met Gly Arg Phe Thr Leu Arg 
465 470 475 480 

Asp Glu Gly Lys Thr He Ala He Gly Lys Val Leu Lys Leu Val Pro 
485 490 495 

Glu Lys Asp 



(2) INFORMATION FOR SEQ ID NO: 41: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1497 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA( genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 41: 

ATGGAACTTT CAGAACCTGT TGTAGAAAAT GGAGAGGTGG AAATGGCCCT AGAAGAATCA 
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TGGGAGCACA 


GTAAAGAAGT 


AAGTGAAGCC GAGOCTGGGG 


GTGGTTCCTC 


GGGAGATTCA 


120 


5 


GGGCCCCCAG 


AAGAAAGTGG 


CCAGGAAATG ATGGAGGAAA 


AAGAGGAAAT 


AAGAAAATOC 


180 




AAATCTCTGA 


TOGTACCCTC 


AGGTGCACCT AAGAAAGAAC 


AOGTAAATGT 


AGTATOCATT 


240 




GGCCATGTAG 


AOGCTGGCAA 


GTCAACCATC GGAGGACAGA 


TAATGTTTTT 


GACTGGAATG 


300 


10 


GCTGACAAAA 


GAACACTGGA 


GAAATATGAA AGAGAAGCTG 


AGGAAAAAAA 


CAGAGAAAOC 


360 




TGGTATTTGT 


CCTGGGCCTT 


AGATACAAAT CAGGAGGAAC 


GAGACAAGGG 


TAAAAGAGTC 


420 


15 


GAAGTGGGTC 


GTCCCTATTT 


TGAAACAGAA AGGAAACATT 


TCACAATTTT 


AGATGOCCCT 


480 




GGOCACAAGA 


GTTTTGTOOC 


AAATATGATT GGTOGTGCTT 


CTCAAGCTGA 


TTTGQCTGTG 


540 




CTGGTCATCT 


CTGOCAGGAA 


AGGAGAGTTT GAAACTGGAT 


TTGAAAAAGG 


TGGACAGACA 


600 


20 


AGAGAACATG 


OGATGTTTGG 


CAAAACGGCA GGAGTAAAAC 


ATTTAATAGT 


GCTTATTAAT 


660 




AAGATGGATG 


ATOOCACAGT 


AAATTGGGGC ATCGAGAGAT 


ATGAAGAATG 


TAAAGAAAAA 


720 




CTOGTGOCCT 


TTTTGAAAAA 


AGTAGGCTTT ACTCCAAAAA 


AGGACATTCA 


CTTTATGOOC 


780 


25 


TGCTCAGGAC 


TGACOGGAGC 


AAATATTAAA GAGCAGTCAG 


ATTTCTGOOC 


TTGGTACACT 


840 




GGATTACCAT 


TTATTOOGTA 


TTTGAATAAC TTGOCAAACT 


TCAACAGATC 


AATTGATGGA 


900 


30 


CCAATAAGAC 


TGCCAATTGT 


GGATAAGTAC AAAGATATGG 


GCACTGTGGT 


OCTGGGAAAG 


960 




CTGGAATOCG 


GGTOCATTTT 


TAAAGGGGAG CAGCTCGTGA 


TGATGCCAAA 


CAAGCACAAT 


1020 




CTAGAAGTTC 


TTGGAATACT 


TTCTGATGAT ACTGAAACTG 


ATTTTGTAGC 


OOCAGGTGAA 


1080 


35 


AAOCTCAAAA 


TCAGACTGAA 


GGGAATTGAA GAAGAAGAGA 


TTCTTCCAGA 


ATTCATACTT 


1140 




TGTGATCCTA 


CTAACCTCTG 


OCATTCTGGA OGCACGTTTG 


ATGTTCAGAT 


AGTGATTATT 


1200 


40 


GAGCACAAAT 


OCATCATCTC 


CCCAGGTTAT AATGCGGTGC 


TGCACATTCA 


TACTTGTATT 


1260 


GAGGAAGTTG 


AGATAACAGT 




AAAAATCAGG 




1370 
1380 




AAGACAOGAC 


cotocttoct 


GAAACAAGAT CAAGTATGCA 


TTGCTOGTTT 


AAGGACAGCA 


45 


GGAAGCATCT 


GOCTOGAGAC 


GTTCAAAGAT TITOCTCAGA 


TGGGTOGTTT 


TACTTTAAGA 


1440 




GATGAGGGTA 


AGACCATTGC 


AATTGGAAAA GTTCTGAAAT 


TGGTOOCAGA 


GAAGGAC 


1497 



50 (2) INFORMATION FOR SEQ ID NO: 42: 
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40 



50 



55 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2057 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA( genomic ) 
(iii) HYPOTHETICAL: NO 
(iv) ANTI -SENSE: NO 

(vii) IMMEDIATE SOURCE: 

(A) LIBRARY: Human fetal brain cDNA library 

(B) CLONE: GEN-077A09 



(ix) FEATURE: 

(A) NAME/KEY: CDS 
20 (B) LOCATION: 144.. 1640 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 42: 

25 TCCCGGCCGG CTCCGGCAGC AACGATGAAG CCTGCADCGG CGCGGGATAC CCTCAAGGTA 60 

AAAGGATGGG ACGGGGGGCA CCTGTGGAAC CTPOOCGAGA GGAADOGTTA GTGTCGCTTG 120 

AAGGTTOCAA TTCAGCCGTT ACC ATG GAA CTT TCA GAA OCT GTT GTA GAA 170 
30 Met Glu Leu Ser Glu Pro Val Val Glu 

1 5 

AAT GGA GAG GTG GAA ATG GCC CTA GAA GAA TCA TOG GAG CAC AGT AAA 218 
Asn Gly Glu Val Glu Met Ala Leu Glu Glu Ser Trp Glu His Ser Lys 
35 10 15 20 25 

GAA GTA AGT GAA GCC GAG CCT GGG GGT GGT TCC TOG GGA GAT TCA GGG 266 
Glu Val Ser Glu Ala Glu Pro Gly Gly Gly Ser Ser Gly Asp Ser Gly 
30 35 40 

OCC CCA GAA GAA AGT GGC CAG GAA ATG ATG GAG GAA AAA GAG GAA ATA 314 
Pro Pro Glu Glu Ser Gly Gin Glu Met Met Glu Glu Lys Glu Glu He 
45 50 55 

45 AGA AAA TCC AAA TCT GTG ATC GTA CCC TCA GGT GCA CCT AAG AAA GAA 362 
Arg Lys Ser Lys Ser Val He Val Pro Ser Gly Ala Pro Lys Lys Glu 
60 65 70 



CAC GTA AAT GTA GTA TTC ATT GGC CAT GTA GAC GCT GGC AAG TCA ACC 410 
His Val Asn Val Val Phe He Gly His Val Asp Ala Gly Lys Ser Thr 
75 80 85 
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ATC GGA GGA CAG ATA ATG TTT TTG ACT GGA ATG GCT GAC AAA AGA ACA 
He Gly Gly Gin He Met Phe Leu Thr Gly Met Ala Asp Lys Arg Thr 
90 95 100 105 



458 



CTG GAG AAA TAT GAA AGA GAA GCT GAG GAA AAA AAC AGA GAA ACC TGG 
Leu Glu Lys Tyr Glu Arg Glu Ala Glu Glu Lys Asn Arg Glu Thr Trp 
110 115 120 



506 



w 



15 



TAT TTG TCC TGG GCC TTA GAT ACA AAT CAG GAG GAA GGA GAC AAG GGT 554 
Tyr Leu Ser Trp Ala Leu Asp Thr Asn Gin Glu Glu Arg Asp Lys Gly 
125 130 135 

AAA ACA GTC GAA GTG GGT CGT GCC TAT TTT GAA ACA GAA AGG AAA CAT 602 
Lys Thr Val Glu Val Gly Arg Ala Tyr Phe Glu Thr Glu Arg Lys His 
140 145 150 



20 



TTC ACA ATT TTA GAT GCC OCT GGC CAC AAG AGT TTT GTC CCA AAT ATG 
Phe Thr He Leu Asp Ala Pro Gly His Lys Ser Phe Val Pro Asn Met 
155 160 165 



650 



25 



ATT GGT GGT GCT TCT CAA GCT GAT TTG GCT GTG CTG GTC ATC TCT GCC 698 
He Gly Gly Ala Ser Gin Ala Asp Leu Ala Val Leu Val He Ser Ala 
170 175 180 185 

AGG AAA GGA GAG TTT GAA ACT GGA TTT GAA AAA GGT GGA CAG ACA AGA 746 
Arg Lys Gly Glu Phe Glu Thr Gly Phe Glu Lys Gly Gly Gin Thr Arg 
190 195 200 



30 GAA CAT GOG ATG TTT GGC AAA ACG GCA GGA CTA AAA CAT TTA ATA GTG 
Glu His Ala Met Phe Gly Lys Thr Ala Gly Val Lys His Leu He Val 
205 210 215 



794 



CTT ATT AAT AAG ATG GAT GAT CCC ACA GTA AAT TGG GGC ATC GAG AGA 
35 Leu He Asn Lys Met Asp Asp Pro Thr Val Asn Trp Gly He Glu Arg 
220 225 230 



842 
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TAT GAA GAA TGT AAA GAA AAA CTG GTG CCC TTT TTG AAA AAA CTA GGC 
Tyr Glu Glu Cys Lys Glu Lys Leu Val Pro Phe Leu Lys Lys Val Gly 
235 240 245 



890 



45 



TTT ACT CCA AAA AAG GAC ATT CAC TTT ATG CCC TGC TCA GGA CTG ACC 938 
Phe Ser Pro Lys Lys Asp He His Phe Met Pro Cys Ser Gly Leu Thr 
250 255 260 265 

GGA GCA AAT ATT AAA GAG CAG TCA GAT TTC TGC OCT TGG TAG ACT GGA 986 
Gly Ala Asn He Lys Glu Gin Ser Asp Phe Cys Pro Trp Tyr Thr Gly 
270 275 280 



50 



TTA CCA TTT ATT COG TAT TTG AAT AAC TTG CCA AAC TTC AAC AGA TCA 
Leu Pro Phe He Pro Tyr Leu Asn Asn Leu Pro Asn Phe Asn Arg Ser 



1034 



55 
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290 



295 



ATT GAT GGA OCA ATA AGA CTG OCA ATT GTG GAT AAG TAC AAA GAT ATG 
lie Asp Gly Pro lie Arg Leu Pro lie Val Asp Lys Tyr Lys Asp Met 
300 305 310 

GGC ACT GTG GTC CTG GGA AAG CTG GAA TOC GGG TOC ATT TIT AAA GGC 
Gly Thr Val Val Leu Gly Lys Leu Glu Ser Gly Ser lie Phe Lys Gly 
315 320 325 

CAG CAG CTC GTG ATG ATG CCA AAC AAG CAC AAT GTA GAA GTT CTT GGA 
Gin Gin Leu Val Met Met Pro Asn Lys His Asn Val Glu Val Leu Gly 
330 335 340 345 

ATA CTT TCT GAT GAT ACT GAA ACT GAT TTT GTA GCC OCA GGT GAA AAC 
lie Leu Ser Asp Asp Thr Glu Thr Asp Phe Val Ala Pro Gly Glu Asn 
350 355 360 

CTC AAA ATC AGA CTG AAG GGA ATT GAA GAA GAA GAG ATT CTT CCA GAA 
Leu Lys lie Arg Leu Lys Gly lie Glu Glu Glu Glu lie Leu Pro Glu 
365 370 375 

TTC ATA CTT TGT GAT OCT AGT AAC CTC TGC CAT TCT GGA GGC AOG TTT 
Phe lie Leu Cys Asp Pro Ser Asn Leu Cys His Ser Gly Arg Thr Phe 
380 385 390 

GAT GTT CAG ATA GTG ATT ATT GAG CAC AAA TCC ATC ATC TGC CCA GGT 
Asp Val Gin He Val He He Glu His Lys Ser He He Cys Pro Gly 
395 400 405 

TAT AAT GCG GTG CTG CAC ATT CAT ACT TGT ATT GAG GAA GTT GAG ATA 
Tyr Asn Ala Val Leu His He His Thr Cys He Glu Glu Val Glu He 
410 415 420 425 

ACA GOG TTA ATC TCC TTG GTA GAC AAA AAA TCA GGG GAA AAA AGT AAG 
Thr Ala Leu He Ser Leu Val Asp Lys Lys Ser Gly Glu Lys Ser Lys 
430 435 440 

ACA OGA OOC GGC TTC GTG AAA GAA GAT CAA GTA TGC ATT GCT OGT TTA 
Thr Arg Pro Arg Phe Val Lys Gin Asp Gin Val Cys He Ala Arg Leu 
445 450 455 

AGG ACA GCA GGA ACC ATC TGC CTC GAG ACG TTC AAA GAT TTT OCT CAG 
Arg Thr Ala Gly Thr He Cys Leu Glu Thr Phe Lys Asp Phe Pro Gin 
460 465 470 



ATG GGT OGT TTT ACT TTA AGA GAT GAG GGT AAG ACC ATT GCA ATT GGA 
Met Gly Arg Phe Thr Leu Arg Asp Glu Gly Lys Thr He Ala He Gly 
475 480 485 
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AAA GTT CTG AAA TTG GTC CCA GAG AAG GAC TAAGCAATTT TCTTGATGCC 1660 
Lys Val Leu Lys Leu Val Pro Glu Lys Asp 
490 495 

5 

TCTGCAAGAT ACTGTGAGGA GAATTGACAG CAAAAGTTCA OCAOCTACTC TTATTTACTG 1720 
CCCATTGATT GACTTTTCTT CATATTTTGC AAAGAGAAAT TTCACAGCAA AAATTCATGT 1780 
10 TTTGTCAGCT TTCTCATGTT GAGATCTGTT ATGTCACTGA TGAATTTAOC CTCAAGTTTC 1840 
CTTOCTCTCT ACCACTCTGC TTCCTTGGAC AATATCAGTA ATAGCTTTGT AAGTGATGTG 1900 
GAOGTAATTG OCTACAGTAA TAAAAAAATA ATCTACTTTA ATTTTTCATT TTCTTTTAGG 1960 

15 

ATATTTAGAC CAOOCTTGTT CCAOGCAAAC CAGAGTGTGT CAGTGTTTGT GTGTGTCTTA 2020 
AAATGATAAC TAACATGTGA ATAAAATACT OCATTTG 2057 

20 



Claims 

25 

1 . A GDP dissociation stimulating protein gene comprises a nucleotide sequence coding for the amino acid sequence 
shown under SEQ ID NO:1. 

2. A GDP dissociation stimulating protein gene comprises the nucleotide sequence shown under SEQ ID NO:2. 

30 

3. A GDP dissociation stimulating protein gene as defined in Claim 2 which has the nucleotide sequence shown under 
SEQ ID NO:3. 

4. A brain-specific nucleosome assembly protein gene comprises a nucleotide sequence coding for the amino acid 
35 sequence shown under SEQ ID NO: 19. 

5. A brain-specific nucleosome assembly protein gene comprises a nucleotide sequence shown under SEQ ID 
NO:20. 

40 6. A brain-specific nucleosome assembly protein gene as defined in Claim 5 which has the nucleotide sequence 
shown under SEQ ID NO:21 . 

7. A human skeletal muscle-specific ubiquitin-conjugating enzyme gene comprises a nucleotide sequence coding for 
the amino acid sequence shown under SEQ ID NO:22. 

45 

8. A human skeletal muscle-specific ubiquitin-conjugating enzyme gene comprises the nucleotide sequence shown 
under SEQ ID NO:23. 

9. A human skeletal muscle-specific ubiquitin-conjugating enzyme gene as defined in Claim 8 which has the nucle- 
so otide sequence shown under SEQ ID NO:24. 

10. A TMP-2 gene comprises a nucleotide sequence coding for the amino acid sequence shown under SEQ ID NO:25. 

11. A TMP-2 gene comprises the nucleotide sequence shown under SEQ ID NO:26. 

55 

12. A TMP-2 gene as defined in Claim 11 which has the nucleotide sequence shown under SEQ ID NO:27. 

13. A human NPIK gene comprises a nucleotide sequence coding for the amino acid sequence shown under SEQ ID 

NO:28. 
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14. A human NPIK gene comprises the nucleotide sequence shown under SEQ ID NO:29. 

15. A human NPIK gene as defined in Claim 14 which has the nucleotide sequence shown under SEQ ID NO:30. 

5 16. A human NPIK gene comprises a nucleotide sequence coding for the amino acid sequence shown under SEQ ID 
NO:31. 

17. A human NPIK gene comprises the nucleotide sequence shown under SEQ ID NO:32. 

w 18. A human NPIK gene as defined in Claim 17 which has the nucleotide sequence shown under SEQ ID NO:33. 

19. A nel-related protein type 1 gene comprises a nucleotide sequence coding for the amino acid sequence shown 
under SEQ ID NO:34. 

is 20. A nel-related protein type 1 gene comprises the nucleotide sequence shown under SEQ ID NO:35. 

21. A nel-related protein type 1 gene as defined in Claim 20 which has the nucleotide sequence shown under SEQ ID 
NO:36. 

20 22. A nel-related protein type 2 gene comprises a nucleotide sequence coding for the amino acid sequence shown 
under SEQ ID NO:37. 

23. A nel-related protein type 2 gene comprises the nucleotide sequence shown under SEQ ID NO:38. 

25 24. A nel-related protein type 2 gene as defined in Claim 23 which has the nucleotide sequence shown under SEQ ID 

NO:39. 

25. A method for the in vitro diagnosis of hereditary diseases and cancer, characterized by employing any of the nucle- 
otide or amino acid sequences as given in claims 1 -24. 

30 

26. The use of any of the nucleotide or amino acid sequences as given in claims 1 - 24 for in vitro diagnosis as well as 
for the preparation of a pharmaceutical for the treatment of diseases. 
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The Search Division considers that the present European patent application does not comply with the 
requirements of unity of invention and relates to several inventions or groups of inventions, namely: 

1- Claims: 25,26 partially and 1-3 

GDP-dissociati on-stimulating protein gene , and 
corresponding method for in vitro diagnosing and use 
for the preparation of a pharmaceutical 



2. Claims: 25,26 partially and 4-6 

brain- specific nucleosome assembly protein gene , 

and corresponding method for in vitro diagnosing and use 

for the preparation of a pharmaceutical 



3. Claims: 25,26 partially and 7-9 

human skeletal-muscle-specific ubi qui tin-conjugating enzyme 
gene ,and corresponding method for in vitro diagnosing and 
use for the preparation of a pharmaceutical 



4. Claims: 25,26 partially and 10-12 

TMP-2 cell proliferation gene and corresponding method for 
in vitro diagnosing and use for the preparation of a 
pharmaceutical 



5. Claims: 25,26 partially and 13-18 

human NPIK phosphatidyl inositol kinase genes and 
corresponding method for in vitro diagnosing and use for the 
preparation of a pharmaceutical 



6. Claims: 25,26 partially and 19-24 

nel -related protein genes and corresponding method for in 
vitro diagnosing and use for the preparation of a 
pharmaceutical 
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